T5 Tokenizer¶
Overview¶
This page includes information about how to use T5Tokenizer with tensorflow-text.
This tokenizer works in sync with Dataset
and so is useful for on the fly tokenization.
>>> from tf_transformers.models import T5TokenizerTFText
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small")
>>> text = ['The following statements are true about sentences in English:',
'',
'A new sentence begins with a capital letter.']
>>> inputs = {'text': text}
>>> outputs = tokenizer(inputs) # Ragged Tensor Output
# Dynamic Padding
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small", dynamic_padding=True)
>>> text = ['The following statements are true about sentences in English:',
'',
'A new sentence begins with a capital letter.']
>>> inputs = {'text': text}
>>> outputs = tokenizer(inputs) # Dict of tf.Tensor
# Static Padding
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small", pack_model_inputs=True)
>>> text = ['The following statements are true about sentences in English:',
'',
'A new sentence begins with a capital letter.']
>>> inputs = {'text': text}
>>> outputs = tokenizer(inputs) # Dict of tf.Tensor
# To Add Special Tokens
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small", add_special_tokens=True)