Training custom dataset

I want to train my custom dataset with yolo11 model, so
1)what should be the size of image before annotating?
2)which format of image(jpeg, jpg etc) I should use for annotating?
3)does the class id should be match with yolo class ids?(class like car, person, etc)
I am basically using CVAT for annotation and exporting in yolov8 object detection txt format. Is there is anything I am missing, please let me know

Hi there! :blush:

Great to hear you’re diving into training a custom dataset with YOLO11. Here’s how you can get started:

  1. Image Size: You can annotate images at their original size. YOLO11 will resize them during training based on the imgsz parameter you set. This flexibility allows you to maintain image quality and detail.

  2. Image Format: JPEG and JPG are both excellent choices for annotations. They are widely supported and efficient in terms of storage.

  3. Class IDs: You can define your own class IDs. They don’t need to match YOLO’s predefined classes. Just ensure consistency across your dataset and configuration files.

Since you’re using CVAT, exporting in YOLOv8 format is perfect. Just make sure your annotations align with your custom classes.

For more detailed guidance, check out our preprocessing annotated data guide. It’s packed with useful tips!

If you have any more questions, feel free to ask. Happy training! :rocket:

1 Like

Thanks for guiding!

@pderrenger is there any issue if we have different ids number for same classes in training dataset, because I have created another project in cvat with other labels/classes so they are disturbed and ids are increasing as I have add more labels, I have edited and try to start from label id:0 but it was not working, rather it is moving to id no 24 and more!
In short, if another annotator will start work on CVAT, then his labels will start from 0 , but in my case first label is 57 etc
2nd question is that: Is all the images to be annotated should be in same format or it can be in any format like PNG,JPEG etc

  1. Annotation label IDs need to be the same, as this is the only way the model distinguishes between objects. You should consult the CVAT documentation or open a support ticket with the CVAT team if you’re unable to use the same label IDs across instances or sessions. I know that some annotation platforms use a unique ID for the annotation record, make sure that you’re not seeing the annotation record ID and instead the label/class ID.
  2. You can use different format of image types such as PNG or JPEG as long as they’re listed as supported by ultralyitcs (same for prediction and training). I would advise to use JPEG to significantly reduce storage requirements unless you have any specific reasons to use another file format. You can use the compress_one_image function to help convert and compress your image dataset.
1 Like

Thanks!
I have some questions about training my custom dataset with yolo:
1)How many images should I annotate for training(train data) my custom data, which have 11 classes? If I am using images with many classes in a single image (such as in traffic) and how much, if I label single class image?
2) Should the annotated images be clear or high resolution image, as I have used frames from a video which has low quality?
3) How many epochs should be best for training 11 classes with yolov11?

Kindly help me in this regard.

  1. It’s impossible to know for certain. There are many reasons for this, but it boils down to the fact that you’re the only person who knows the answer to “is it good enough” and the number of images it takes to get there is something you’ll need to discover for yourself. This older guide has a good summary of information and will still be broadly applicable. A “general” starting rule is to include 1,000 unique images per class (for your 11 classes, that would be up to 11,000 unique images, if an image has 2+ classes then it can count as one image for multiple, but don’t build a dataset only from those types of images).
  2. The images should be as similar to the environment that your model will run on for your application. If the model will be used on high-resolution video frames, you should use those for training, if they’ll be low-resolution video frames, you should use those instead. Sometimes you might need to include both, if the input data could be either high- or low-resolution images/frames, then including both is the only way to be certain that the model be able to perform for either/both.
  3. Again, there is no “definitive” answer that will tell you the number of epochs to use. You should refer to the guide linked above. If you’re still uncertain, then you should use the default 100 epochs as a starting point.
1 Like

Thanks for guiding

1 Like