To modify YOLOv8

I’m modifying YOLOv8 for pedestrian detection by:

  1. Adding CBAM after each C2f in the backbone
  2. Replacing FPN with BiFPN
  3. Using a smaller detection head

Keep getting errors while integrating—is this possible? Need help fixing it.

Have you tested the default YOLOv8 or YOLO11 model (no modifications) with your data? The COCO trained models do a very good job at detecting the person class without the need for architecture modification. There are lots of ways that modifying the model structure can go wrong and unless there’s an established need for it, generally I would not recommend attempting it.

yes I have tried, but by reading some research papers on YOLO modification i developed my interest and chosen this as my MTech project but now i got stuck.

If not introducing new module can you suggest me some easy modification or may be reduction in my scope which is doable and will be good for small objects

I know lots of people have developed an interest to try modifying the YOLO architecture, and if your goal is to learn, then it is something that I encourage all to try! Trying to modify the model structure is not a simple task however and a lot of how to do it, and how to do it well, will come down to testing for your specific use case. That means that it’s unlikely anyone will be able to tell you “do this” to get a good result for your use, you will probably have to test lots of modifications to find what works best.
If you’d still like to try implementing the CBAM module, then I recommend reviewing the related GitHub issues, as there are many. You might be able to find someone who has outlined their solution that could also work for you to test with, but keep in mind that it only might help.

Since you’re looking to improve the performance for detection of smaller objects, here are a few things you can try:

  1. Increase the inference image size using the imgsz argument.
  2. Try using the yolov8l-p2 or yolov8x-p2 model configurations (there are no pretrained versions however).
  3. Try using tiled-inference with SAHI
  4. Review the many GitHub issues where people have asked about the same.

thanks for your suggestions.

1 Like

Thanks for the update. Modifying core architectural components like the neck by replacing FPN with BiFPN or adding attention modules like CBAM after each C2f block in the backbone can indeed be quite complex and prone to integration errors.

For your MTech project focusing on improving pedestrian detection, especially for small objects, you might find it more manageable and still impactful to explore adjustments that leverage the existing strengths of YOLOv8. Consider training with larger input image sizes; increasing imgsz often significantly helps in resolving smaller details. You can find more about training parameters in our Train mode documentation. Tailoring your data augmentation strategies to better preserve or even emphasize small objects during training can also be beneficial. You could also experiment with starting from a larger pretrained YOLOv8 model, such as YOLOv8m or YOLOv8l, which might provide a richer feature set. Finally, thorough hyperparameter tuning is always a valuable step.

These approaches can often yield good improvements for small object detection with a more contained scope of work. Good luck with your project!

thank you pderrenger, for your suggestions.

You’re most welcome, Mohit! I’m glad the suggestions were helpful. Best of luck with your MTech project and your work on modifying YOLOv8!