I’m new to the community and currently working on training a model for object detection using a custom dataset. So far, things have been going well, but I’ve reached the point where I need to experiment with data augmentation to address some slight color imbalances in my dataset.
Due to the time-consuming nature of the training process, I’ve been running the training on a server as well as occasionally on my personal computer. However, I’ve noticed something puzzling — despite using the same dataset, model (yolo11n.pt), and identical settings, the results differ between the two environments. I’ve double-checked by running the test twice, but I’m still seeing inconsistent outcomes.
I’d appreciate any insights or advice on what might be causing these differences. Thank you in advance for your help!
YOLO11n summary (fused): 238 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|███████
all 5 126 0.356 0.315 0.305 0.137
1 5 43 0.419 0.209 0.25 0.0974
2 5 83 0.293 0.42 0.359 0.176
Speed: 0.4ms preprocess, 25.0ms inference, 0.0ms loss, 15.0ms postprocess per image
YOLO11n summary (fused): 238 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 1/1 [00:00<0
all 5 126 0.442 0.302 0.379 0.157
1 5 43 0.337 0.302 0.33 0.123
2 5 83 0.546 0.301 0.428 0.192
Validating runs\detect\deterministic2\weights\best.pt...
Ultralytics 8.3.1 🚀 Python-3.12.5 torch-2.4.1+cu118 CUDA:0 (NVIDIA RTX A3000 Laptop GPU, 6144MiB)
YOLO11n summary (fused): 238 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 1/1 [00:00<0
all 5 126 0.0179 0.201 0.0398 0.0108
1 5 43 0.00994 0.186 0.0282 0.00582
2 5 83 0.0259 0.217 0.0514 0.0158
Speed: 0.4ms preprocess, 2.7ms inference, 0.0ms loss, 1.4ms postprocess per image
Results saved to runs\detect\deterministic2
Ultralytics 8.3.2 🚀 Python-3.8.10 torch-2.4.1+cu121 CPU (Intel Xeon E5-2676 v3 2.40GHz)
YOLO11n summary (fused): 238 layers, 2,582,542 parameters, 0 gradients, 6.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|███████
all 5 126 0.0127 0.143 0.0285 0.0137
1 5 43 0.00587 0.093 0.00487 0.00136
2 5 83 0.0196 0.193 0.0522 0.026
At first, I thought the issue might be related to differences between the server and my computer. But after running the test again (so now I have four results from the same test), all of them produced different results. Is this expected behavior, or am I missing something?
I’d appreciate any insights or advice on what might be causing these differences. Thank you in advance for your help!