Set validation set the same as training set

Nima_Chelongar · July 10, 2024, 2:23pm

hi guys.
i have trained model with YOLOV8 in which used training set as my validation set, but i get very good result over unseen data.
now when i split the data into the training and validating, i get overfitted model and it doesn’t predict well.
is it chance or something else?

BurhanQ · July 10, 2024, 2:25pm

This is quite an unusual result. That said, it is possible that with a small dataset you don’t have enough samples in the training set when splitting the data. Can you share more information about the dataset size, number of classes, and number of instances?

Nima_Chelongar · July 10, 2024, 2:28pm

i am trying to read persian car plates and my whole dataset consists of 420 images of plates which 60 of them are considered as validation set.
and also about 20 classes.

BurhanQ · July 10, 2024, 2:36pm

420 images should be considered a starting dataset, but not necessarily a full dataset. I recommend reading thru this guide from the Docs:

Even tho it’s a YOLOv5 guide, the principles are the same for any object detection model. Additionally be sure to check out the article linked at the end as well as it further expands on the peculiarities of training neural network models.

The critical point here is that the model needs to “discover” the proper filters to correctly detect and classify features that will accurately sort objects into the classes. To accomplish this, there needs to be a very high number of samples, otherwise the model will not be able to generalize well. Data annotations are unfortunately still one of the most burdensome aspects of training models, but you can use model assisted labeling to help speed up your annotation process (basically use your 400 images to train a model, use that model to help annotate more images that you correct as needed).

Nima_Chelongar · July 10, 2024, 2:38pm

thanks for your help.
i have imbalanced data as well, because each plate has one character only but 7 numbers inside it.

Nima_Chelongar · July 10, 2024, 3:09pm

some of my classes have 30(usually alphabets) instances but some of them have 200 instances(mostly numbers).
but my question is : how does the model which is trained without distinct validation set is working very well over new data which network haven’t seen during training?

BurhanQ · July 10, 2024, 3:16pm

Unfortunately @Nima_Chelongar it’s not feasible to know the exact reason why this occurs. The most likely reason is because you have more samples when the validation set is a mirror of the training set, since you have a small number of overall samples. Loss is calculated for both training and validation, and when you have a small number of overall samples, this will have an impact on the model’s overall performance.

My recommendation here is that if you need to understand the why, you’ll have to do a lot of work and analysis to achieve that level of understanding, but if you only need to have a model that performs well, start collecting and annotating more data. I presume that most people will be looking to have a model that performs well and not necessarily need to understand the “why” for this situation, as it will be highly time consuming and doesn’t provide you with any actionable information to help your model improve; the best path to take for improving model performance is to collect more data.

Nima_Chelongar · July 10, 2024, 3:28pm

thank you very much for your great information.

glenn-jocher · July 16, 2024, 8:24am

@Nima_Chelongar strange result. Best practices is to split your data to allow validation metrics to correctly predict generalization capability on unseen data.

Topic		Replies	Views
Very bad validation metrics in custom dataset with fine tuned model YOLO yolov10 , yolo , question , support , code	3	86	March 31, 2025
The ultralytics code doesn't do the validation Support question , discussion	3	171	October 8, 2024
YoloV5 fine tuning with custom dataset YOLO yolo , support	5	85	March 19, 2025
25/5000 yolov8 classification model, why can't it recognize anything after training Discussion question , support	3	81	March 28, 2025
I am trying to make Yolov8 model 1622 class object detection YOLO support	8	239	September 3, 2024

Set validation set the same as training set

Related topics