Quantization

I am trying to run a YOLOv10 model on the NPU of an i.MX8M Plus. After quantization/conversion of the model to best_full_integer_quant.tflite, the model contains operations using int64 typed values I observe using Netron. I get errors when the model is loaded on the device as below:

WARNING: Fallback unsupported op 48 to TfLite
ERROR: Int64 output is not supported
ERROR: Int64 input is not supported

What is the procedure to create/convert a YOLOv10 model which does not include operations on int64 typed values.

Thank you

@ABest2 What library are you using and what’s the command you’re using for export and quantization?

It should be due to batch normalization parameters.

You can try this before quantizing. Export the saved model.pt you get after running the code

Hi @BurhanQ,

I am using the following script for export and quantization. I have tried the various combinations of true/false with optimize and simplify with the same result.

Thank you

////////////////////////////////////////////////////////////////////////////////////////////////
from ultralytics import YOLO

Load the YOLOv10 model

model = YOLO(“/home/sutter/Desktop/YoloV10-train/runs/detect/train2/weights/best.pt”)

Export the model to TFLite INT8 format

model.export(format=“tflite”, int8=True, data=‘/home/sutter/Desktop/YoloV10-train/export.yaml’, imgsz=640, optimize=True, simplify=True, nms=False, batch=1, workspace=6.0)
//////////////////////////////////////////////////////////////////////////////////////////////////

Hi @Toxite,

I will try what you suggest and update this thread.

Thank you

Hi @Toxite,

I used the following script prior to export:

/////////////////////////////////////////////////////////////////////////////////////////////////////
from ultralytics import YOLO

#This must be run in the yoloConvEnv Conda environment using the latest version of Yolo.

model = YOLO()

model = YOLO(‘/home/sutter/Desktop/YoloV10-train/runs/detect/train2/weights/best.pt’)

for m in model.model.model.modules():
if hasattr(m, “track_running_stats”): del m.num_batches_tracked

model.ckpt.update(dict(model=model.model))
del model.ckpt[“ema”]
model.save(“model.pt”)

/////////////////////////////////////////////////////////////////////////////////////////////////

Unfortunately, the exported version of model.pt was still rejected from the NPU because of the same errors I described before.

Thank you

Can you show the name of the layers with int64 operations?

Hi @Toxite,

I have attached a screenshot of the Find window after searching for int64 within the fully quantized model.

Thank you

Can you check the ONNX graph too and see if INT64 exists?

Hi @Toxite,

There are many int64 instances in the ONNX model as well.

Thank you

You can remove the postprocessing from the model and export.

from ultralytics import YOLO, ASSETS
from ultralytics.nn.modules import Detect

model = YOLO("yolov10n.pt")
for m in model.model.model.modules():                     
    if hasattr(m, "num_batches_tracked"): del m.num_batches_tracked

model.ckpt.update(dict(model=model.model))
if "ema" in model.ckpt: del model.ckpt["ema"]
model.save("model.pt")

model = YOLO("model.pt")
Detect.postprocess = lambda s,x,y,z: x
model.export(format="tflite", int8=True)

However, you will have to manually apply the post-processing after inference:

1 Like

Hi @Toxite,

The model seems to be running on the NPU now as I do not see the int64 error. I need to examine the output tensor now.
Your help is greatly appreciated!