Quantization

ABest2 · October 25, 2024, 7:59pm

I am trying to run a YOLOv10 model on the NPU of an i.MX8M Plus. After quantization/conversion of the model to best_full_integer_quant.tflite, the model contains operations using int64 typed values I observe using Netron. I get errors when the model is loaded on the device as below:

WARNING: Fallback unsupported op 48 to TfLite
ERROR: Int64 output is not supported
ERROR: Int64 input is not supported

What is the procedure to create/convert a YOLOv10 model which does not include operations on int64 typed values.

Thank you

BurhanQ · October 26, 2024, 2:17am

@ABest2 What library are you using and what’s the command you’re using for export and quantization?

Toxite · October 27, 2024, 7:00am

It should be due to batch normalization parameters.

You can try this before quantizing. Export the saved model.pt you get after running the code

ABest2 · October 28, 2024, 3:26pm

Hi @BurhanQ,

I am using the following script for export and quantization. I have tried the various combinations of true/false with optimize and simplify with the same result.

Thank you

////////////////////////////////////////////////////////////////////////////////////////////////
from ultralytics import YOLO

Load the YOLOv10 model

model = YOLO(“/home/sutter/Desktop/YoloV10-train/runs/detect/train2/weights/best.pt”)

Export the model to TFLite INT8 format

model.export(format=“tflite”, int8=True, data=‘/home/sutter/Desktop/YoloV10-train/export.yaml’, imgsz=640, optimize=True, simplify=True, nms=False, batch=1, workspace=6.0)
//////////////////////////////////////////////////////////////////////////////////////////////////

ABest2 · October 28, 2024, 3:28pm

Hi @Toxite,

I will try what you suggest and update this thread.

Thank you

ABest2 · October 28, 2024, 4:47pm

Hi @Toxite,

I used the following script prior to export:

/////////////////////////////////////////////////////////////////////////////////////////////////////
from ultralytics import YOLO

#This must be run in the yoloConvEnv Conda environment using the latest version of Yolo.

model = YOLO()

model = YOLO(‘/home/sutter/Desktop/YoloV10-train/runs/detect/train2/weights/best.pt’)

for m in model.model.model.modules():
if hasattr(m, “track_running_stats”): del m.num_batches_tracked

model.ckpt.update(dict(model=model.model))
del model.ckpt[“ema”]
model.save(“model.pt”)

/////////////////////////////////////////////////////////////////////////////////////////////////

Unfortunately, the exported version of model.pt was still rejected from the NPU because of the same errors I described before.

Thank you

Toxite · October 28, 2024, 7:51pm

Can you show the name of the layers with int64 operations?

ABest2 · October 28, 2024, 8:30pm

Hi @Toxite,

I have attached a screenshot of the Find window after searching for int64 within the fully quantized model.

Thank you

Toxite · October 29, 2024, 10:12am

Can you check the ONNX graph too and see if INT64 exists?

ABest2 · October 29, 2024, 3:40pm

Hi @Toxite,

There are many int64 instances in the ONNX model as well.

Thank you

Toxite · October 29, 2024, 11:20pm

You can remove the postprocessing from the model and export.

from ultralytics import YOLO, ASSETS
from ultralytics.nn.modules import Detect

model = YOLO("yolov10n.pt")
for m in model.model.model.modules():                     
    if hasattr(m, "num_batches_tracked"): del m.num_batches_tracked

model.ckpt.update(dict(model=model.model))
if "ema" in model.ckpt: del model.ckpt["ema"]
model.save("model.pt")

model = YOLO("model.pt")
Detect.postprocess = lambda s,x,y,z: x
model.export(format="tflite", int8=True)

However, you will have to manually apply the post-processing after inference:

github.com

ultralytics/ultralytics/blob/886d0c7127301fe52ea3aaeb94bf2a4fa4992baa/ultralytics/nn/modules/head.py#L145


      
              if self.end2end:
                  for a, b, s in zip(m.one2one_cv2, m.one2one_cv3, m.stride):  # from
                      a[-1].bias.data[:] = 1.0  # box
                      b[-1].bias.data[: m.nc] = math.log(5 / m.nc / (640 / s) ** 2)  # cls (.01 objects, 80 classes, 640 img)
          
          def decode_bboxes(self, bboxes, anchors):
              """Decode bounding boxes."""
              return dist2bbox(bboxes, anchors, xywh=not self.end2end, dim=1)
          
          @staticmethod
          def postprocess(preds: torch.Tensor, max_det: int, nc: int = 80):
              """
              Post-processes YOLO model predictions.
          
              Args:
                  preds (torch.Tensor): Raw predictions with shape (batch_size, num_anchors, 4 + nc) with last dimension
                      format [x, y, w, h, class_probs].
                  max_det (int): Maximum detections per image.
                  nc (int, optional): Number of classes. Default: 80.
          
              Returns:

ABest2 · October 30, 2024, 7:15pm

Hi @Toxite,

The model seems to be running on the NPU now as I do not see the int64 error. I need to examine the output tensor now.
Your help is greatly appreciated!

Topic		Replies	Views
--disable_group_convolution Support yolo , support	3	106	October 24, 2024
False positives after converting YOLOv8 to .tflite YOLO support	6	607	February 26, 2025
YOLO to lighter versions conversion for CPU deployement YOLO yolo , question , pytorch , onnx	1	15	June 25, 2025
I would like to quantize my custom trained model YOLO question	1	271	January 5, 2025
New Release: Ultralytics v8.3.29 Discussion releases , announcements , ultralytics-official	0	44	November 12, 2024

Quantization

Load the YOLOv10 model

Export the model to TFLite INT8 format

Related topics