YOLO to lighter versions conversion for CPU deployement

BurhanQ · June 25, 2025, 12:50pm

Inference speed will depend a lot on the CPU. ONNX is one format, but there are numerous formats

You’ll have to test to see what works best on your specific hardware. Every situation will be different, so there’s no way for anyone to tell you exactly what the correct format to use is, it’s something you’ll have experiment. All the arguments and details for exporting is in the documentation and you may want to also view the Integrations pages for more details.

Other than testing the various export formats, one of the biggest factors to help reduce inference time is imgsz. The lower the value you can use for imgsz the faster your inference will be. As an example, I just tested using the default imgsz=640 and imgsz=320 using yolo11n.pt (no export).

yolo val model=yolo11n.pt device=cpu data=coco128.yaml imgsz=640
>>> Speed: 
    0.8ms preprocess,
    42.2ms inference,
    0.0ms loss,
    1.5ms postprocess
per image

yolo val model=yolo11n.pt device=cpu data=coco128.yaml imgsz=320
>>> Speed: 
    0.2ms preprocess,
    12.3ms inference,
    0.0ms loss,
    0.9ms postprocess
per image

Reducing imgsz by half results in a 3x inference time speed up. It might not be exactly the same on your system due to differences in hardware and environment, and you might not be able to reduce by half, but finding the smallest acceptable imgsz will help improve inference speeds. Additionally, when exporting a model, including nms=True can help reduce postprocessing time some, so you might want to test exports with that feature enabled.

To get more explicit help, you’ll need to be clear about your target speeds and hardware. As mentioned previously, it still won’t be definitive, but it might help better understand the situation. As an example, if your target is to achieve <1\text{ms} inference speeds, it might not be possible with a CPU or at least it might not be possible with the specific CPU in the system you’re using.

Remember, it’s not a good idea to ask how to do X when your goal is to accomplish Y. You should share what your true intention, so others have a better understanding of how to help you. The discussion is very different if your goal is “monitor an assembly line that is moving at 3 parts per second” than “lower the inference time as much as possible.” The former provides enough context and detail for others to provide specific advice, where as the latter is open ended and ambiguous.

Topic		Replies	Views
Slimming with onnxslim 0.1.50 fails with yolo export on Jetson 6.2 L4T Discussion yolo , troubleshooting , resource	4	86	April 26, 2025
Quantization of yolov11 model Discussion yolo , question , feature	2	23	June 23, 2025
Yolo11 quantization YOLO yolo , question , support , troubleshooting	1	24	June 19, 2025
Yolov11 pruning and quantizing YOLO yolo , question	3	608	April 9, 2025
New Release: Ultralytics v8.3.148 Discussion releases , announcements , ultralytics-official	0	19	June 3, 2025

YOLO to lighter versions conversion for CPU deployement

Related topics