Hi everyone, i´m new using YOLO and i´ve been facinating with this world but i’ve had some troubles .
I did a fine tunning with 4 classes and yolov10m, 2,000 instances for each one, with a 500 epochs i got a mAP50:90% and mAP50-95: 75%, but when i tested my fine-tunned model with another validation set the results was horrible
i’ve thought it’s my data distribution, maybe many car instances, or the size of my new data, what could be the problem? wich experiments do i have to do to solve it?
validation from my own dataset:
Thanks for sharing the details of your issue. It’s quite common to see a performance drop when validating on a dataset different from the one used during training, especially if the data distributions or characteristics (like image conditions, object sizes, annotation styles) vary significantly between the two sets.
The large difference between your initial validation results (mAP50: 90%) and the results on the new set (mAP50: 12.6%) strongly suggests a domain gap between your training/original validation data (derived from BDD100k, KITTI) and this new validation set.
To investigate further, you could:
Compare the visual characteristics and annotation quality of your new validation set against your training and original validation sets. Are there noticeable differences in lighting, camera angles, object scales, or how objects are labeled?
Analyze the per-class metrics on the new validation set (as you’ve shown). The very low scores for ‘traffic light’ and ‘crosswalk’ might indicate these classes are particularly different or underrepresented in the new set compared to your training data.
Examine the prediction images (val_batch*_pred.jpg) generated during validation on the new set. These visuals can offer direct insight into how the model is failing (e.g., missing objects, incorrect classifications, poor localization). You can find these images in the validation run directory, usually runs/detect/val/.
Understanding the differences between the datasets is key to addressing the performance gap.