Methods to Improve YOLOv11 Model Training Performance

Hey everyone, I have a question I hope you can help me with. I’m working on a project related to crop disease spot detection. Using the project dataset, which contains around 1,500 images, I trained my model. The results look okay but don’t meet the project requirements:

  • Precision (P): 0.887

  • Recall (R): 0.804

  • mAP@0.5: 0.864

  • mAP@0.5:0.95: 0.412

I want to improve the precision and mAP@0.5:0.95 to 0.9+ and 0.6+ respectively. I’ve tried many approaches. Initially, I trained a YOLOv5 version of the model, but the results were not satisfactory. Then I switched to YOLOv11, which is what I’m currently using. I also tried adjusting parameters to improve performance, but the improvement was minimal.

Then, I consulted others and was advised to try replacing the YOLOv11 backbone with something like DenseNet. I started trying to modify it, but clearly, I failed, which is why I’m asking here.

To be honest, I’m a newcomer to this field. I only understand some technical terms and roughly how models operate. I can’t understand the structure or code at all. However, the project task has been assigned to me, so I must complete it.

I tried using ChatGPT to help me figure out the relationships. It guided me to create my own yolov11.yaml file, replace the backbone with a DenseNet structure, create a related DenseNet .py file to register the backbone with the model, and then call it later. I also made a test.py file to check whether the input and output channels aligned, because I had failed to replace DenseNet in YOLOv5 before.

Despite all this, it still failed because the output channels couldn’t align. I feel very confused and lost. I couldn’t find any tutorials online that explain this. I understand that at a basic stage I should study more theory, but given the situation, I need to figure out a way to complete this task.

I hope someone can help me clarify what’s really going on, what preparations I should make, what to pay attention to when replacing a network or even modules, and what knowledge I should study to tackle this.

Modifying the model isn’t going to bring any significant improvements compared to improving the dataset. Your dataset is small.

And modifying the architecture means you will have to train from scratch instead of starting from a pretrained model, which will get you worse accuracy.

You should improve dataset, or use a larger model like yolo11l.pt.

2 Likes

Thank you for your reply, Toxite. This is indeed something I had considered. I first used yolo11s.pt, and after upgrading to yolo11m.pt, the performance did improve significantly. I also tried larger models, but my computer couldn’t really handle them, so I switched to other approaches.

Besides learning how to train models, I also want to try replacing the backbone network myself to better understand how different modules affect YOLO. So everything I’m doing right now isn’t just about improving the metrics — it’s also about learning how these structures work.

When I first started exploring the YOLO series models, I saw that they were highly adaptable and very modular, with many plug-and-play components. Naturally, I became curious about what would happen if I integrated other modules into YOLO11 — what kind of results could I achieve?

So may I ask you how I can replace network modules in YOLO11? And if possible, what steps should I take to get such a modified version running?:+1:

The guide for custom modules is here:

You said it’s for learning but at the same time you mentioned you’re using ChatGPT to modify Ultralytics, which is counter-intuitive. And if you’re stuck at channel-related errors, it’s usually a sign that some core deep learning basics are still missing. It’s not something that can be fixed through a single reply. It’s very important to understand the fundamentals first.

To be honest, I’m a newcomer to this field.

You can’t really jump to modifying Ultralytics if you’re new to deep learning. You should be studying the fundamentals first.

As I mentioned, any modified architecture will not be pretrained and it will be trained from scratch. Training from scratch on such a small dataset will almost always cause worse accuracy.

You can use Colab or Kaggle to train larger models.

Thank you very much for your answer, it helped me a lot. Earlier, I was probably too eager to achieve my goals and ended up taking shortcuts—relying too much on ChatGPT and neglecting to think for myself. I also misunderstood the idea of modular backbones, assuming they worked like game mods where you can just drop them in and everything works, but that’s clearly not the case. I’ll start studying the fundamental concepts and work on understanding the underlying principles and structures.

1 Like

It’s always tempting to try to jump to a solution, especially since LLMs are built precisely for doing just that. Like Toxite mentioned, in all likelihood (and in nearly every case), increasing/improving your dataset will yield the best results.

One pointer to help out with collecting more labeled data. Use any of the trained models to help label new data. There are lots of ways to go about this with tools like Label Studio and Fiftyone. Once you’ve done that, check the results, fix any errors, and then train a new model. Continue doing this and you’ll be able to quickly build up a massive labeled dataset. Additionally, you can use models like SAM or SAM2 to help with labeling as well.

1 Like

thanks,i will try this later :+1:

Glad to hear it helped! When you do try it, keep notes on exactly what you change (data, model size, training settings) and how the metrics respond—that makes it much easier to learn what really matters.

If you run into a specific issue (errors, unstable training, strange metrics), feel free to come back and share your model.train command, a few sample images, and the training curves, and we can dig into it together.

1 Like