T5 Tokenizer

Overview

This page includes information about how to use T5Tokenizer with tensorflow-text. This tokenizer works in sync with Dataset and so is useful for on the fly tokenization.

>>> from tf_transformers.models import  T5TokenizerTFText
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small")
>>> text = ['The following statements are true about sentences in English:',
            '',
            'A new sentence begins with a capital letter.']
>>> inputs = {'text': text}
>>> outputs = tokenizer(inputs) # Ragged Tensor Output

# Dynamic Padding
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small", dynamic_padding=True)
>>> text = ['The following statements are true about sentences in English:',
            '',
            'A new sentence begins with a capital letter.']
>>> inputs = {'text': text}
>>> outputs = tokenizer(inputs) # Dict of tf.Tensor

# Static Padding
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small", pack_model_inputs=True)
>>> text = ['The following statements are true about sentences in English:',
            '',
            'A new sentence begins with a capital letter.']
>>> inputs = {'text': text}
>>> outputs = tokenizer(inputs) # Dict of tf.Tensor

# To Add Special Tokens
>>> tokenizer = T5TokenizerTFText.from_pretrained("t5-small", add_special_tokens=True)

T5TokenizerTFText

T5TokenizerLayer