Hi, I’m currently working on my master’s thesis, where I need to evaluate multiple models for defect detection on synthetic datasets and eventually estimate the position of the defects in the image.
Here’s my setup: I generated various 3D models of parts and wrote a Python script to render images of them. To avoid regenerating the models from scratch, I simulated different part scales by applying multiple zoom levels in the renderer.
My goal is to train a model on this custom dataset to detect defects and estimate their bounding boxes using computer vision.
My question is: will using different zoom levels cause inaccuracies in the bounding box coordinates, and if so, is this something to address during training, or can it simply be corrected at inference time using a basic matrix transformation?
thanks alot for your help 
Yes, different zoom levels are fine for Ultralytics YOLO training, and they usually help the model become more robust to scale changes.
The key point is that the label must match the final rendered image. If your zoom is just a post-render resize, then the box can be corrected with a simple scale transform. If it is a true camera zoom / FOV change inside the renderer, then you should recompute the 2D box from the rendered projection, not try to “fix” it later with one generic matrix.
During training, YOLO already handles normal image resizing/letterboxing consistently, and at inference predictions are mapped back to the original image size. If you ever need to manually rescale boxes between image shapes, use scale_boxes() in the Ultralytics utilities or see the scale_boxes() reference.
So short answer: no inherent inaccuracy from multiple zoom levels, as long as your annotations are generated from the exact final image geometry. If you want, I can also suggest a good synthetic-data setup for defect detection with Ultralytics YOLO26.
Thanks for the support.
So basically your answer covers the scenario where the datasets are already annotated with one constant zoom level and then resized afterward. But if the dataset is generated with various zoom levels and annotated accordingly, then the bounding boxes should be accurate — however, this requires more annotation effort. What would you suggest in this case? I’m also interested in the setup you mentioned.