My dataset contains over 200,000 images. After training, the model achieved mAP@0.5 = 0.81 and mAP@0.5:0.95 = 0.51 with imgsz = 640, and I did not observe any false detections. However, when I switched the inference image size to imgsz = 448, the model started detecting background regions as the target, and the false positives had high confidence(0.76).
To address this, I tried a simple sample-balancing approach: extracting additional frames from videos, converting them into images, and adding them to the training set. This helped temporarily, but the issue still occurs when the model is applied to new scenes.
Using the same dataset, I tested both YOLOv5 and YOLOv8 with official/default settings: pretrained weights, batch size 256, 4× RTX 4090 (24GB), 300 epochs, and no other parameter changes.
This behavior is almost always an input-resolution mismatch: you trained/validated at imgsz=640, but at inference imgsz=448 the whole image is rescaled differently, so background textures can start to “look like” your target at that scale and the model becomes overconfident (you’ll usually see AP/precision drop if you validate at 448 too).
First, I’d confirm it by running validation at the same inference size (if metrics drop at 448, it’s expected):
yolo val model=best.pt data=your.yaml imgsz=448
If you needimgsz=448 in production, the fix is to train (or fine-tune) for that size or enable multi-scale so the model learns to be robust across sizes. You can also reduce false positives by increasing your confidence threshold, and by adding “hard negative” images from those new scenes (backgrounds that look similar but are not the target), which is the standard approach when precision is low as described in the Ultralytics note on improving precision by adjusting confidence and adding difficult negatives.
Example fine-tune (recommended to try Ultralytics YOLO11):
If you share your results.png (the one with P/R curves) and confirm whether this is single-class or multi-class, I can suggest the best conf starting point and whether this looks more like domain shift vs. label noise.
First,thank you very much for your help! Let me describe my previous workflow in detail: I trained a model with imgsz=640, then set mode=export to export an ONNX model with imgsz=448. When I ran inference with the ONNX model, I found false positives on new videos. If I extracted frames from a new video and added them to the training set ( added “hard negatives”), I could fix the false-positive issue for that specific video, but false positives would appear again when I switched to another new video. Previously, I also tried training directly with imgsz=448 using all the default parameters, without changing the confidence threshold, and I still got false positives.
After receiving your message, I just tried what you suggested. I found that when I validated my previously trained imgsz=640best.pt model with imgsz=448 (using CUDA_VISIBLE_DEVICES=1 torchrun --nproc_per_node=1 val.py), mAP50 did drop significantly—from 0.80 down to 0.73. The PR plots produced during training and validation are shown below. This is a multi-class object detection task. If you need any other result plots, please leave me a message. I’m truly touched by your reply!
If possible, could you recommend a suitable starting confidence threshold, and also comment on whether this looks more like domain shift or label noise—plus how you would tell? I also have one more question: could these false positives be related to data augmentation? Could augmentation be causing them? The false positives have very high confidence, which feels very strange to me.
I found that when fine-tuning a 448 model from a best.pt trained at 640, enabling multi-scale training causes mAP@0.5 to drop significantly. I’m using the SGD optimizer with lr=0.01, batch size=256, and 8 RTX 4090 GPUs. Should the learning rate be adjusted accordingly?
First,thank you very much for your help! Let me describe my previous workflow in detail: I trained a model with imgsz=640, then set mode=export to export an ONNX model with imgsz=448. When I ran inference with the ONNX model, I found false positives on new videos. If I extracted frames from a new video and added them to the training set ( added “hard negatives”), I could fix the false-positive issue for that specific video, but false positives would appear again when I switched to another new video. Previously, I also tried training directly with imgsz=448 using all the default parameters, without changing the confidence threshold, and I still got false positives.
After receiving your message, I just tried what you suggested. I found that when I validated my previously trained imgsz=640best.pt model with imgsz=448 (using CUDA_VISIBLE_DEVICES=1 torchrun --nproc_per_node=1 val.py), mAP50 did drop significantly—from 0.80 down to 0.73. The PR plots produced during training and validation are shown below. This is a multi-class object detection task. If you need any other result plots, please leave me a message. I’m truly touched by your reply!
If possible, could you recommend a suitable starting confidence threshold, and also comment on whether this looks more like domain shift or label noise—plus how you would tell? I also have one more question: could these false positives be related to data augmentation? Could augmentation be causing them? The false positives have very high confidence, which feels very strange to me.
I am turning off mosaic augmentation. Should I use the official pre-training weights or the best.pt that I previously trained for 640? Do I need to modify the lr? I used bs=256, 8 cards, SGD optimizer before, lr=0.01