YOLO9 introduced the PGI (Programmable Gradient Information) auxiliary branch during training, which seemed like a principled way to improve gradient flow and representation learning, especially for deeper architectures.
However, in later models (YOLOv10, YOLOv11), the auxiliary branch appears to be completely removed, and I don’t see a successor or evolution of this idea in the newer designs.
Is this because the PGI auxiliary branch did not provide consistent gains in practice, conflicted with newer architectural choices (e.g. decoupled heads, end-to-end training, efficiency goals), or is it have any other reason?
I’m curious whether this was a conscious design rollback based on empirical results, or if the benefits of PGI were later absorbed implicitly through other architectural or training changes. Maybe someone can share some thought about it because i see PGI kinda interesting to research. Thanks !
It’s been a while, but IIRC the PGI addition for subsequent models, was found to be minimally beneficial with the comparative overhead/complexity added. The author of YOLOv9 was not Ultralytics, but the R&D team does investigate numerous architectures and developments. Since Ultralytics YOLO is a product, the decision making process is different than what goes into an academic research model. Academic models don’t always get support, or long-term support, and are primarily focused on the novel methods, not pragmatic usability or maintenance. Ultralytics YOLO gets long term support and must maintain a balance between innovative design, against the overall performance and long term development maintenance.
This is generally the case for any industry. R&D and/or academic research may come up with a very beneficial, interesting, or appealing technology. Despite this, sometimes when this technology is attempted to migrate into common, practical, or industry use, it may fail to meet the promise it showed in a more controlled environment. Of course that’s not to say that the idea is forever not practical. At environments change, it may be possible to revisit old ideas. Take deep learning in general. The concept of the technology has been around for decades, but the hardware to conduct deep learning was not capable of running such systems. Fast-forward 40-50 years, and suddenly there’s a boom in deep learning, because the hardware finally caught up! That’s one example, but there are likely many others as well. All of that is to say, your question is valid, as it may not have been worthwhile when first assessed, but it doesn’t mean it will always be that way.
I’ll mention your post to one of the guys on the research team. I wouldn’t expect a reply here, but if he shares anything with me that’s relevant, I’ll be certain to relay whatever I can! I hope that helps answer your question 