Hi all,
I’m having issues understanding how Yolo deals with coordinates and image sizes. As a consequence, I’m getting an out of bounds error message. Below I’ll describe the problem I’m having, what I have done to fix it, and my understanding of what the relationship between image size and coordinates are.
I am training an object detection model with 129 classes and close to 30,000 1920x1200 images. Those images were each labeled using x1, y1, x2, and y2 coordinates. After creating my yaml file and distributing the images/labels in their directories (test/train/validation), I proceeded to train the model. I got the error below in quotes. Please notice that all of my images (I’m only showing a subset here) are corrupt, probably meaning that they all had issues with the out of bounds error.
After reading up on potential issues, I transformed all labels to the center-x, center-y, h, w format. Got the same error message. Then I transformed the labels into the normalized center-x, center-y, h, w format. Same error. I then fed the model the original image size (imgsz=[1920, 1200]) and got the same error. Below I’m providing the labels showing the coordinates for one of the images for the cases above.
What I understand about image size and box coordinates is that the transformations are dependent on the image size provided. If no size is provided when transforming coordinates, the image size is scale to width=640 and the height to whatever value that will preserve the aspect ratio. Also, when feeding the image to the model, it gets resized to the values you provide. So, if you format the coordinates based on some image size and feed the model with images of some other size, the model will at best not produce accurate results and at worst produce an out of bounds error. Is my understanding correct? I also believed that the Yolo format for coordinates was x1, y1, x2, and y2. I read today that the model requires the center-x, center-y, h, w format. Did I read this correctly?
So, my request is for further information and, maybe, a workaround for my issue. If you can guide me further, that will help a lot. Perhaps a resource, link, or video explaining in more detail what I need to solve my issue. Thank you indeed,
Ralf
Error message:
“
WARNING updating to ‘imgsz=640’. ‘train’ and ‘val’ imgsz must be an integer, while ‘predict’ and ‘export’ imgsz may be a [h, w] list or an integer, i.e. ‘yolo export imgsz=640,480’ or ‘yolo export imgsz=640’
train: Scanning D:\Dev\FishCam\data\train\labels… 1163 images, 0 backgrounds, 1067 corrupt: 100% | 1163/1163 [00:02<00:00, 406.60it/s]
train: WARNING
D:\Dev\FishCam\data\train\images\R167_Dermatolepis_inermis0001.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [2.3203125 1.9875 1.1929687]
…
“
x1, y1, x2, and y2 coordinate for one sample image (first column is the class):
14 1414 569 1556 682
74 1087 455 1457 1072
113 234 473 709 647
center-x, center-y, h, w for the same sample image:
14 1485.0 625.5 142.0 113.0
74 1272.0 763.5 370.0 617.0
113 471. 5 560.0 475.0 174.0
Normalized center-x, center-y, h, w for the same sample image:
14 2.32 0.98 0.22 0.17
74 1.99 1.19 0.57 0.96
113 0.74 0.87 0.74 0.27