Exporting to tflite with int8 quantization for edge deployment

I’m trying to export a yolov11 model to tflite with int8 quantization to run on an imx. However the yolo11n_int8.tflite file produced is still in fp32 and the yolo11n_full_integer_quant.tflite is quantized to however the output being in int8 means it doesn’t have the precision required to express bounding boxes and confidence values. I believe the backbone and neck of the model should be quantized to int8 and the detection head left in fp32 but I don’t see how to do this.

What is the intended way to extract predictions from quantized tflite models?

However the yolo11n_int8.tflite file produced is still in fp32

yolo11n_int8.tflite uses dynamic range quantization. It’s not in FP32. The weights are converted to INT8 in dynamic range quantization, but only for storage benefits. During inference, the weights are converted back to FP32 and inference runs at FP32 pecision.

yolo11n_full_integer_quant.tflite is quantized to however the output being in int8 means it doesn’t have the precision required to express bounding boxes and confidence values

yolo11n_full_integer_quant.tflite requires you to scale the input and output manually.

Both of these models can be loaded and run in Ultralytics , and they work fine. There’s nothing wrong with the exported models.

If you want FP32 input and output, then there’s another file generated yolo11n_integer_quant.tflite, that would have FP32 input and output.