My current object detection best.pt which was based on yolo11s architecture is too lagging in cpu, but i have to deploy the model in CPU to process realtime footages. So should I try ONNX conversion or is there any better approach to make it lighter and keep accuracy maximum?
If ONNX conversion is better, then is there any guide on onnx conversion with quantization because i tried "from ultralytics import YOLO
model = YOLO(“best.pt”)
model.export(
format=“onnx”, # ONNX target
imgsz=640, # freeze input to 640×640 (matches your letterbox)
half=False, # you can switch to True if you want FP16 speedups
dynamic=False, # keep a static 1×3×640×640 input so session.get_inputs()[0].shape yields ints
simplify=True, # fold constants & strip redundant nodes
opset=17, # modern opset for widest runtime support
nms=False, # you’re doing your own NMS in Python
batch=1, # max batch baked in (ignored if dynamic=True)
device=0, # export on CPU (or “0”/GPU if you trained on GPU)
project=“path”, # explicit filename
name=“best” # explicit filename (default is model name)
)". this was giving me a bigger size onnx without quantization. So where can i get the syntax for the same smaller and lighter versions?