YoloV5 fine tuning with custom dataset

Hi, I would like information on how to set up folder validation during the fine-tuning phase of a model I have previously pretrained. Do I only have to put the new images/labels in the val folder, or do I have to put the ones used for the previous validation in addition to the new ones?
Example: If during the first training I have 50 val images, while in the fine tuning I only have 10 new val images, should I put only 10 or 50 old +10 new ones? Thank you in advance for your answer

  1. You mentioned “pretrained” and “fine-tune” which likely means you think you’re “extending” or “adding to” your model training, which it is not. You need to train on ALL the data at once.
  2. You should put all the validation data together (same with training data) and set up your data YAML configuration to point to those directories.

To reiterate. When you train a model, the starting weights are only a starting point, and the information from the previous training is not assured to remain intact (meaning that it might perform worse). This means that to have the model “learn” all the data, all of the data must be present during training. Any data left out at the time of training is considered out of distribution and there is no assurance the model will retain anything learned from any data that it was previously trained on.

When starting with the COCO model weights, if you train on any data that does not include the entire COCO dataset, there is not guarantee that the resulting model will perform as well as the original. In fact if there is no data with the COCO classes at training time, the model will not detect any of the COCO classes, as this part of the model is essentially reset at the start of a new training session.

1 Like

I recommend avoiding use of the term “fine tuning” when it comes to training CNN models. It’s more appropriate for LLMs and only causes confusion. You can search in the forums (or community where I’m active) and you’ll find I’ve made this statement often. It’s my opinion that this is a highly conflated term and leads to significant misunderstandings.

1 Like

Thank you very much for your reply, everything is clearer to me now. I am new and inexperienced in this world and don’t really know all these details yet. Thanks again and best regards!

1 Like

Hello Wuming9472,

For fine-tuning with a custom dataset and setting up the validation folder, you should include all images you want to use for validation during the fine-tuning phase. This means you should combine both the old validation images and the new ones.

In your example, you should put all 50 old validation images plus the 10 new ones in the val folder, for a total of 60 images. The structure should follow the Image Classification Datasets Overview, ensuring that your val directory correctly reflects all images designated for validation.