I have a general question on object detection in computer vision. As you all know, for detection, you’ll need images and associated text files for labels. To train a model from scratch (no transfer learning involved) that you construct yourself, does it matter what the format of the labels are? In other words, using x1/x2/y1y2, x1/y1/x2/y2, center/width/length, or any other format matters? My understanding is that as long as there is consistency and that the coordinates reflect object location, format for labeling objects on an image will not matter. Please clarify. Thank you so much,
Right, for YOLO. What I meant was for any other model. It seems to me that if you create a model that accepts some consistent coordinate representation of an object’s location, the model will learn what the object is after proper training. Am I correct? Thx,
I understand. The format representation of bounding boxes will vary by model, and might depend on the structure of the model as to what’s needed/best. In all likelihood, the coordinates will need to be normalized going into the model for training.