Clarification needed - regarding YOLOv11 and YOLOv12

If I want to add custom attention blocks to the YOLO model for object detection, which model is better to experiment either the YOLOv11 or YOLOv12?

YOLO12 already uses attention

Tq @Toxite. I will try with YOLOv11 model then.

Sounds good — I’d also start from Ultralytics YOLO11 as the stable baseline and then add/ablate your custom attention blocks on top. YOLO12 is already attention-centric, but it’s a community model and can be less predictable to iterate on (training stability/memory/CPU throughput) as noted in the YOLO12 docs.

If you want to modify the architecture, the usual workflow is to copy a YOLO11 model YAML from ultralytics/cfg/models/11/, add your custom module, and train from that YAML:

from ultralytics import YOLO

model = YOLO("path/to/your_yolo11_custom.yaml")
model.train(data="coco8.yaml", imgsz=640, epochs=100)

If you share what attention block you’re adding (CBAM/SE/ECA/Transformer-style, etc.) and where you want to insert it (backbone vs neck), I can suggest the cleanest place in the YOLO11 graph to start.

tq @pderrenger. I would like to add neighborhood Attention and deformable attention in backbone for enhanced performance on UAV images.

@pderrenger can you suggest the place where can I insert these attentions in yolo11