YOLO hand gesture model: excellent mAP, poor background rejection!?!

fharder · June 18, 2026, 2:13pm

Hello,

I trained YOLOv8n, YOLO11n, YOLO11s and YOLO26n on a custom hand gesture dataset for robot control.

Classes:

sit
down
up
bang

The validation results are very strong (Precision/Recall ≈ 0.99, mAP50 ≈ 0.99), and the confusion matrices show that the gesture classes themselves are recognized very well.

However, during real-world testing, background scenes were sometimes classified as gestures, resulting in more false positives than expected.

I did not use a separate background class and only included a limited number of background-only images.

My questions:

Can this happen even with very high mAP values?
Could missing background-only images explain the false positives?
Would you recommend adding more negative/background images without labels?
For static hand gestures, would MediaPipe or another keypoint-based approach generally be more suitable than YOLO?

Thank you for any advice.

pderrenger · June 19, 2026, 2:21pm

Yes — this can absolutely happen with Ultralytics YOLO. High mAP only means the model does well on your labeled validation split; it does not guarantee good background rejection in deployment. The model testing guide covers this well: real-world testing often exposes overfitting, leakage, or missing negative cases that validation metrics miss.

In your case, limited background-only images is a very likely cause. If the model mostly saw hands/gestures during training, it may learn “something hand-like = one of 4 classes” and produce false positives on cluttered scenes.

Yes, I’d recommend adding more hard negatives: background-only images, arms without gestures, partial hands, tools, shadows, weird lighting, robot workspace frames, etc. In detection, empty images with no labels are valid and useful negatives.

I’d also do two quick things:
run a true held-out test split with lots of real background frames, and try a higher inference threshold, e.g. conf=0.5 or 0.6, since that often cuts false positives fast.

For static gestures, a keypoint approach like MediaPipe can be better if hands are close/visible and your main problem is pose classification. YOLO is usually better when you also need robust hand detection/localization in messy scenes. A hybrid pipeline is often

fharder · June 19, 2026, 2:36pm

Hi,

thank you very much for your quick and helpful reply.

That makes sense and confirms my suspicion that the main issue is not necessarily the gesture classes themselves, but the limited amount of real negative/background examples in the dataset.

I’ll follow your suggestions and test more hard negatives, a real held-out background test split, and a higher confidence threshold.

Thanks again for the clear explanation.

Best regards,
Frank

Toxite · June 19, 2026, 3:50pm

You could get high mAP if your validation set is small. But it doesn’t mean the performance is good in the real-world. You need a large and representative validation set to get correct mAP.

You could add the images with false positives that you obtain during real world testing to your training dataset and then retrain.

Topic		Replies	Views
How does YOLOv8 reduce false recognition? How does it reduce false recognition of similar objects YOLO yolov8 , question	14	587	July 12, 2025
YOLO v8 consider the background as a target, What can I do about that? Discussion yolo , question , discussion	15	809	January 2, 2026
Yolov8n-cls has a background class? YOLO yolo , code	2	62	February 5, 2026
How to Accurately Calculate Object Detection Metrics Using Ultralytics? Discussion question	1	713	September 16, 2024
Confusion matrix contains many classes not in dataset Support troubleshooting	3	111	October 5, 2025

YOLO hand gesture model: excellent mAP, poor background rejection!?!

Related topics