Do detected boxes maintain proportionality to the frames of objects in the dataset?

My dataset classes consist of very similar objects but different aspect ratios. The class names are (w/h): 1/3, 1/2, 2/3, and so on. I use scale=0.5 augmentation. May I assume that detected boxes will maintain the proportional scale, so I can calculate width like: “w = box.h * class”?

They will maintain the scale. But why do you need different classes for each? Can’t you just do w/h from the box alone?

They will maintain the scale.

Thanks! I observe this in my tests, but is there any math confirming it? Or can you point me to specific YOLO code ensuring this? Anything I could reference it in a paper would help.

But why do you need different classes for each? Can’t you just do w/h from the box alone?

I provided it as an example. In reality, my objects are more complex and I do post-processing math for internal parameters assuming YOLO maintains aspect ratio to GT for detected boxes.

Short answer: not as a model guarantee.

In Ultralytics YOLO, the ground-truth labels are transformed consistently with the image during preprocessing/augmentation, so the annotation geometry is preserved by the pipeline. The two most relevant code references are Instances.scale(), which scales box coordinates with the image transform, and scale_boxes(), which maps predicted boxes back between resized/letterboxed and original image shapes.

That said, the predicted box aspect ratio is still a learned regression output, so it can deviate from GT. So for a paper, I’d phrase it as: YOLO preserves label geometry under image resizing/letterboxing, but it does not explicitly enforce that detections keep the exact GT aspect ratio. If you use w = h * class_ratio, that’s an application assumption you should validate empirically on your dataset, rather than a strict guarantee from YOLO itself.

If you want, I can help you draft a 2–3 sentence paper-safe wording with the exact repo references.