Seeking Advice on Optimizing YOLOv5 Performance

Hey everyone,

I am diving deep into YOLOv5 for a project and could use some advice from the community. I have been experimenting with various hyperparameters and augmentations, but I am not seeing the performance improvements I hoped for.

When I was searching about this, I came across these resources/articles How can we speed up Yolov5?? · Issue #8065 · ultralytics/yolov5 · GitHub, and as per them I tried some troubleshooting but the issues still showing.

Has anyone here found any effective strategies for optimizing YOLOv5 performance, especially in terms of balancing accuracy and speed? Any tips on fine-tuning or common pitfalls to avoid would be greatly appreciated!

Looking forward to hearing your experiences and suggestions!

Thanks in advance!

For any neural network, when it comes to improving a generic “accuracy” (as a general term referring to detection + classification performance; mAP metric), the number one thing that will help is data. You need to ensure you train using data that is:

  1. Sufficiently large. How large? Approximately 10,000 instances per class, with images from various sources. This would be for a model that is expected to generalize for a variety of image capture devices and environmental conditions. For models that don’t need to generalize that broadly, you could probably have less instances per class, but I wouldn’t expect “good” performance on anything less than 1,000-2,000 object instances.

  2. Well annotated. What does that mean? There are a few aspects that are fundamental to a well annotated dataset:

    • All objects are labeled.
    • All objects are labeled consistently.
    • Bounding geometry (bounding boxes for standard detection models) enclose the object with the absolute minimum gap; box-lines are tight to object extrema.

These are general rules that apply in most/many cases, but there are always “special” circumstances. Failing prior knowledge/experience, these guidelines are a good place to start, but they are also not definitive, meaning that specific cases might need to be more/less strict. Ultimately, no one can give a definite answer on “how much” or “how good” a dataset needs to be to give x performance result, it is the responsibility of the person training the model to establish this via experimentation.

With respect to the “fastest” performance, there are lots of “ifs” or “whens” but this will be highly dependent on the hardware available. If running on a machine with an NVIDIA GPU, you want to use TensorRT. If you’re using an Intel CPU, OpenVino would be a good option. When running on a RaspberryPi, so far NCNN seems like a good bet. Again, testing on your hardware with your model, is the only way to find out “how fast” can the model be.

Chasing “better” performance should be something done when it’s necessary based on the requirements of the application/project; there needs be a reason greater than “just because” to spend the effort. Clearly defining the needs/requirements will be the most important aspect of any project. It guides all decision making and allows to measure success. In situations where you might not know what the requirements are, the best way to estimate would be to do some early testing and base your criteria from test results. If it’s a totally novel application, then you can try to estimate what’s needed by your end goal of the project, but you have to be ready to be flexible.

One last note. I don’t know why you’re specifically looking at YOLOv5, but I would recommend using the ultralytics package if it doesn’t require a major change in your work. This is where there is active development and will provide better flexibility in the long term for your project. Additionally there are more models available using the ultralytics package, with a larger set of features, and more documentation, so it will be an overall quality of life improvement.