YOLO26 experiment: Raw model output to bounding boxes

Did some experimentation with the yolo26 model. Basically, I want to replicate the inference process from model outputs to bounding box drawing. Here’s what I did:

  • collected model’s raw predictions
  • scaled them to the original image dimentions
  • plotted the boxes on the original image using matplotlib.patches

Understood a bit of the internals of the ultralytics codebase in the process. There are less abstractions in the code than I would have imagined, but I’d like to build a very simple training/inference pipeline mostly for learning and experimentation.

Your approach is basically the right mental model: the “gotcha” is that the network predicts in the letterboxed (resized + padded) image space, and you need to undo that exact resize/pad before drawing on the original. In Ultralytics YOLO this undo step is handled for you, so if you just want the final pixel boxes to plot, you can pull them directly from the Results object (they’re already scaled to the original image):

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
r = model("https://ultralytics.com/images/bus.jpg")[0]

xyxy = r.boxes.xyxy.cpu().numpy()   # x1,y1,x2,y2 in original-image pixels
conf = r.boxes.conf.cpu().numpy()
cls  = r.boxes.cls.cpu().numpy().astype(int)

print(xyxy[0], conf[0], cls[0])

If you’re trying to replicate the full path from raw head outputs → final boxes, the key post steps are “decode → NMS → scale back to original,” implemented in ultralytics/utils/ops.py (look for non_max_suppression() and scale_boxes()). The coordinate formats we expose on Results are summarized in the bounding box glossary, and the expected “absolute vs normalized” behavior is also covered in the common issues guide section on box coordinates.

If you share what you’re calling “raw predictions” (tensor shape + where you tapped it: PyTorch model forward vs an exported model), I can point you to the exact decode step for that output format, since it differs depending on whether you captured pre- or post-decode outputs.

1 Like