New YOLO12 (YOLOv12) Technical Video Series Released

Hi YOLO Community! :waving_hand:

:rocket: I’m excited to share a project I’ve been working on over the past few months: a technical walkthrough video series on YOLO12.

This time, I focused on unpacking the new innovations introduced in YOLO12. I walk through the updated network architecture, highlight key differences from YOLO11, explore the research paper, and break down the new codebase, especially the parts where Area Attention comes into play.

The series includes in-depth explanations of:

:small_blue_diamond: Area Attention modules and blocks (A2C2f, ABlock, and AAttn)
:small_blue_diamond: Area Attention explained with detailed tensor-level flow using an example
:small_blue_diamond: Visualizations showing Area Attention in action
:small_blue_diamond: An overview of Position Encoding and Perceiver-style mechanisms

:folded_hands: Huge thanks to the research authors Yunjie Tian, Qixiang Ye, and David Doermann for pushing the YOLO series forward, and to Glenn Jocher and the Ultralytics team, whose YOLO11 implementation and codebase serve as the core framework on which YOLO12 is built. And of course, big thanks to the open-source community for the contributions, insights, and discussions that make this kind of progress possible.

:light_bulb: If you’re curious about YOLO12’s architecture or want to explore how it works under the hood, feel free to check it out. The first few videos offer an overview, and the rest go deeper into the technical details.

:play_button: Playlist: https://www.youtube.com/playlist?list=PLTcDXKiPdqrH_NoUQtB054fQeC_D69Pg2

Hope you find it helpful! Happy learning!

Marc Tornero

1 Like

Hi Marc,

Thank you for creating and sharing this excellent video series on YOLO12. This is a fantastic resource for the community, and we appreciate you taking the time to dive deep into the architecture. Your kind words mean a lot, and we’re thrilled that the foundation of our work could support new research and explorations like this.

It’s important for users to be aware that while YOLO12 introduces interesting concepts like attention-centric processing, it can exhibit training instability, consume significantly more memory, and run 2-3x slower on CPUs compared to other models. For these reasons, we continue to recommend Ultralytics YOLO11 for most production and general use cases.

Great work on the videos, and thank you again for contributing to the open-source community!

Hi Paula,

Thank you so much for the kind words and for taking the time to share this! I really appreciate the support from the Ultralytics team. Your work laid the foundation for making this kind of deep dive possible.

I also appreciate you highlighting the practical limitations of YOLO12. In the videos, I mention that choosing between models depends on factors like dataset characteristics, speed requirements, and training preferences. I also note that YOLO12 may benefit from Flash-Attention for faster inference, though support is limited to certain NVIDIA GPUs, so it’s not guaranteed that users will be able to take advantage of it.

YOLO11 remains a rock-solid choice for production, offering a great balance of performance, speed, and reliability. Throughout the series, I encourage users to check out my videos covering YOLO11 as well.

Thanks again to the whole team for everything you’re building and sharing with the community!

Marc

1 Like

Hi Marc,

Thank you for the thoughtful reply and for creating such a fantastic resource for the community.

Your balanced perspective on model selection is spot on—it’s crucial to weigh the trade-offs for each specific use case. It’s high-quality content like your video series that makes the open-source community so valuable.

Keep up the great work