Albert TFlite¶
!pip install tf-transformers
!pip install sentencepiece
!pip install tensorflow-text
!pip install transformers
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # Supper TF warnings
import tensorflow as tf
print("Tensorflow version", tf.__version__)
from tf_transformers.models import AlbertModel
Tensorflow version 2.7.0
Convert a Model to TFlite¶
The most important thing to notice here is that, if we want to convert a model to tflite
, we have to ensure that inputs
to the model are deterministic, which means inputs should not be dynamic. We have to fix batch_size, sequence_length and other related input constraints depends on the model of interest.
Load Albert Model¶
Fix the inputs
We can always check the
model
inputs and output by usingmodel.input
andmodel.output
.We use
batch_size=1
andsequence_length=64
.)
model_name = 'albert-base-v2'
batch_size = 1
sequence_length = 64
model = AlbertModel.from_pretrained(model_name, batch_size=batch_size, sequence_length=sequence_length)
INFO:absl:Successful ✅✅: Model checkpoints matched and loaded from /root/.cache/huggingface/hub/tftransformers--albert-base-v2.main.999c3eeace9b4d2c3f2ad87aad4548b3b73ea3cc/ckpt-1
INFO:absl:Successful ✅: Loaded model from tftransformers/albert-base-v2
Verify Models inputs and outputs¶
print("Model inputs", model.input)
print("Model outputs", model.output)
Model inputs {'input_ids': <KerasTensor: shape=(1, 64) dtype=int32 (created by layer 'input_ids')>, 'input_mask': <KerasTensor: shape=(1, 64) dtype=int32 (created by layer 'input_mask')>, 'input_type_ids': <KerasTensor: shape=(1, 64) dtype=int32 (created by layer 'input_type_ids')>}
Model outputs {'cls_output': <KerasTensor: shape=(1, 768) dtype=float32 (created by layer 'tf_transformers/albert')>, 'token_embeddings': <KerasTensor: shape=(1, 64, 768) dtype=float32 (created by layer 'tf_transformers/albert')>, 'token_logits': <KerasTensor: shape=(1, 64, 30000) dtype=float32 (created by layer 'tf_transformers/albert')>, 'last_token_logits': <KerasTensor: shape=(1, 30000) dtype=float32 (created by layer 'tf_transformers/albert')>}
Save Model as Serialized Version¶
We have to save the model using model.save
. We use the SavedModel
for converting it to tflite
.
model.save("{}/saved_model".format(model_name), save_format='tf')
WARNING:absl:Found untraced functions such as word_embeddings_layer_call_fn, word_embeddings_layer_call_and_return_conditional_losses, type_embeddings_layer_call_fn, type_embeddings_layer_call_and_return_conditional_losses, positional_embeddings_layer_call_fn while saving (showing 5 of 125). These functions will not be directly callable after loading.
INFO:tensorflow:Assets written to: albert-base-v2/saved_model/assets
INFO:tensorflow:Assets written to: albert-base-v2/saved_model/assets
Convert SavedModel to TFlite¶
converter = tf.lite.TFLiteConverter.from_saved_model("{}/saved_model".format(model_name)) # path to the SavedModel directory
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("{}/saved_model.tflite".format(model_name), "wb").write(tflite_model)
print("TFlite conversion succesful")
WARNING:absl:Buffer deduplication procedure will be skipped when flatbuffer library is not properly loaded
TFlite conversion succesful
Load TFlite Model¶
# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="{}/saved_model.tflite".format(model_name))
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
Assert TFlite Model and Keras Model outputs¶
After conversion we have to assert the model outputs using
tflite
and Keras
model, to ensure proper conversion.
Create examples using
tf.random.uniform
.Check outputs using both models.
Note: We need slightly higher
rtol
here to assert.
# Dummy Examples
input_ids = tf.random.uniform(minval=0, maxval=100, shape=(batch_size, sequence_length), dtype=tf.int32)
input_mask = tf.ones_like(input_ids)
input_type_ids = tf.zeros_like(input_ids)
# input type ids
interpreter.set_tensor(
input_details[0]['index'],
input_type_ids,
)
# input_mask
interpreter.set_tensor(input_details[1]['index'], input_mask)
# input ids
interpreter.set_tensor(
input_details[2]['index'],
input_ids,
)
# Invoke inputs
interpreter.invoke()
# Take last output
tflite_output = interpreter.get_tensor(output_details[-1]['index'])
# Keras Model outputs .
model_inputs = {'input_ids': input_ids, 'input_mask': input_mask, 'input_type_ids': input_type_ids}
model_outputs = model(model_inputs)
# We need a slightly higher rtol here to assert :-)
tf.debugging.assert_near(tflite_output, model_outputs['token_logits'], rtol=3.0)
print("Outputs asserted and succesful: ✅")
Outputs asserted and succesful: ✅