Export yolov11s at a size different from default one

Dear All,
I am using Hailo-8L with frigate in home assistant on a raspberry pi 5 for videosurveillance.
I already exported yolov11s model at a resolution of 480x480 without any fine-tuning and I compiled it for hailo-8L by their DFC compiler.
The detection accuracy under real-world conditions is quite good, but the inference time is still quite high, as I measured 18 ms. Hence, I am thinking to switch to a resolution of 320x320. Should I expect a significant decrease of accuracy? Should I fine-tune the exported model before compiling it?
thank you in advance
Cristiano

It would have lower accuracy. Fine-tuning is better. But you would need to fine-tune on the full MSCOCO dataset

Dear Toxite,
first of all thank you for the reply.
I think full MSCOCO dataset is too large for training on my apple M3 max…
Is there any estimate of how worse will be model at 320 compared to default resolution
in terms of mAP50 or the like? Aren’t yolo models scale invariant? A fine-tuning on few thousands images, such as val2017, is a non-sense?

best Cristiano

Fine-tuning on smaller dataset may or may not help. The model can overfit to the smaller dataset and lose generalization ability.

Scale-invariance means YOLO can detect objects of different sizes. It doesn’t mean YOLO has no bias towards the image size it was trained on.

I ran validation on MSCOCO at 320. The mAP at 320 is 37.7 which is a drop of almost 10 points and it’s worse than YOLO11n at 640 which has an mAP of 39.5. So you can just try using YOLO11n at 640 than using YOLO11s at 320.

in my use case I am just intereset in few COCO classes, i.e. person, car, auto, bicycle, motorcycle, cat and dog, can this help for the fine-tuning?

thank you very much

Yes — for Ultralytics YOLO, that can definitely help.

If you only care about person, car (auto in COCO is just car), bicycle, motorcycle, cat, and dog, then fine-tuning on only those classes at imgsz=320 is the right experiment. Just do it from the original .pt model before export/compile, not from the exported Hailo model.

The biggest improvement usually comes from images from your own cameras, or a filtered COCO train2017 subset, not val2017. A few thousand well-matched images can absolutely be useful. It likely won’t reduce latency much, but it can recover part of the accuracy lost at 320, especially for person and car. Small/far objects will still be the hardest. I’d validate on your own holdout set, similar to the advice in the model evaluation guide.

thank you very much for the hints!

If export the yolo model at 480 instead of 320, what is the expected mAP drop? Should I expect a drop of 4-5 points assuming roughly a linear relationship between size and accuracy?

thank you again

C.

It’s 43.7 at 480. Drop of 3.3 points

thank you very much!