Very bad validation metrics in custom dataset with fine tuned model

ernestoMecue · March 25, 2025, 8:17pm

Hi everyone, i´m new using YOLO and i´ve been facinating with this world but i’ve had some troubles .
I did a fine tunning with 4 classes and yolov10m, 2,000 instances for each one, with a 500 epochs i got a mAP50:90% and mAP50-95: 75%, but when i tested my fine-tunned model with another validation set the results was horrible

                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 8/8 [00:11<00:00,  1.49s/it]
                   all        123        674      0.568      0.084      0.126     0.0651
                   car        115        522      0.806    0.00192      0.129     0.0887
         traffic light         17         25          1          0    0.00257   0.000257
               bicycle         20         21       0.46      0.333       0.33       0.16
             crosswalk         70        106    0.00751   0.000921     0.0423      0.012

i’ve thought it’s my data distribution, maybe many car instances, or the size of my new data, what could be the problem? wich experiments do i have to do to solve it?
validation from my own dataset:

Toxite · March 26, 2025, 1:14am

It sounds like overfitting. Does your dataset consist of unique images or did you augment it to reach that number?

What’s the training code you used? Did you start with pretrained model?

ernestoMecue · March 30, 2025, 11:35pm

for my ft dataset i used images taken for this datasets:
bdd100k, KITTI
my training code is:

model = YOLO('yoloModels/yolov10s.pt')
    print(model.args)
    results = model.train(data="dataset\data.yaml", epochs=500, imgsz=640, batch=8, save_period = 50, 
                            project = "entrenamientos", name = "yolov10s_500", device=0) 
results = model.val()

pderrenger · March 31, 2025, 1:27pm

Hi Ernesto,

Thanks for sharing the details of your issue. It’s quite common to see a performance drop when validating on a dataset different from the one used during training, especially if the data distributions or characteristics (like image conditions, object sizes, annotation styles) vary significantly between the two sets.

The large difference between your initial validation results (mAP50: 90%) and the results on the new set (mAP50: 12.6%) strongly suggests a domain gap between your training/original validation data (derived from BDD100k, KITTI) and this new validation set.

To investigate further, you could:

Compare the visual characteristics and annotation quality of your new validation set against your training and original validation sets. Are there noticeable differences in lighting, camera angles, object scales, or how objects are labeled?
Analyze the per-class metrics on the new validation set (as you’ve shown). The very low scores for ‘traffic light’ and ‘crosswalk’ might indicate these classes are particularly different or underrepresented in the new set compared to your training data.
Examine the prediction images (val_batch*_pred.jpg) generated during validation on the new set. These visuals can offer direct insight into how the model is failing (e.g., missing objects, incorrect classifications, poor localization). You can find these images in the validation run directory, usually runs/detect/val/.

Understanding the differences between the datasets is key to addressing the performance gap.

Topic		Replies	Views
Yolo11 training with custom dataset Support yolo , discussion , code	9	1404	February 17, 2025
I Need Help with YOLOv5 Training for Custom Object Detection YOLO yolov5 , support	1	348	July 24, 2024
Unusual behavior in the graphs resulting from model.train YOLO question , code	1	27	January 12, 2025
Seeking Advice on Optimizing YOLOv5 Performance Discussion discussion	1	283	August 30, 2024
Train a "clean" Yolov12 model (not pre-trained) on a custom dataset YOLO yolo , question , support , code	3	50	June 6, 2025

Very bad validation metrics in custom dataset with fine tuned model

Related topics