How does YOLOv8 reduce false recognition? How does it reduce false recognition of similar objects

Hello everyone, I’d like to ask a question and hope to receive your help. Thank you all in advance. When recognizing the scissors gesture, I had 50,000 samples and obtained a dataset of 150,000 samples by rotating ±90°. I first trained this dataset of 150,000 (with no background images and completely correct annotations) for 100 rounds. After the training is completed, the model will misrecognize other gestures. For example, it will misrecognize the ok gesture, palm gesture, fist gesture, and gesture with only two or three fingers extended.
My first solution is:
Based on the last.pt model trained in these 100 rounds, after adding 10k misidentified images as background images to the dataset, the misrecognition was significantly reduced. However, the model still misrecognizes some images that look like scissors gestures. Moreover, at long distances, as long as there are gestures, regardless of whether they are scissors gestures or not, it will recognize them. At first, I intended to add the misrecognition of these long-distance targets as background images. However, I considered that as the amount of background image data added increased, the positive recognition effect would deteriorate (because I had tried before that when more than 20,000 images were added, the positive recognition was very poor).
Later, I tried to change my thinking, so there was the second way
My second solution is:
I marked all the misidentified gestures and labeled them as “other”. Among them, the scissors gestures are also divided more meticulously. The label of the scissors gesture on the back of the hand is “Reverse_scissor”, and the label of the gesture on the front of the palm is “Forward_scissor”. The sample quantities of them are respectively: other: 90,000 pieces, Forward_scissor: 60,000 copies, Reverse_scissor: 50,000 copies. However, after 100 rounds of training and using the model for reasoning, I found that the model was a little overfitted to “other”. Because I found that sometimes when I raised the scissors gesture, it was also recognized as the “other” category. I suspect it’s due to the sample size. I have now reduced the sample size to around 50,000 and started training for 200 rounds. I’m very worried that the effect of this training will also be very poor. So I sincerely hope to receive your suggestions.
I also checked the official documentation of ultralytics and the issues of ultralytics. The links are as follows: box, cls, and dfl loss gain · Issue #10375 · ultralytics/ultralytics · GitHub, inside documentation about in parameter CLS and box, whether I can set the CLS makes the model more attention classification? Issue the 如何设置多目标检测类别损失权重解决数据集分布不均衡的问题 · Issue #15615 · ultralytics/ultralytics · GitHub mentioned can set parameters in the YAML, but I don’t know how to set up. I’m extremely eager to know how to reduce the misrecognition of other gestures now. I’ve been troubled by this for several days. If any friends see this post, please kindly help me. Thank you very much!
The following is the configuration during my training and the configuration of my dataset
Training configuration
from ultralytics import YOLO
import time

if name == ‘main’:
model = YOLO(“yolov8n.pt”)
model.train(
data=r"E:\Project_Gesture\model_script\scissors.yaml",
imgsz=640,
device=0,
Lr0 = 0.01,
epochs=200,
batch=64,
close_mosaic=10,
name=“3clss_sciss_yolo8n”,
Fliplr = 0.5,
Flipud = 0.5,
degrees=15,

mosaic=0.5

mixup=0.3

plots=True,
Scale = 0.5,
)

Configuration of the yaml file of the dataset:
train: [E:\All_Data\Gesture_Data\train.txt]
val: [E:\All_Data\Gesture_Data\val.txt]

cls class_weights class_weights: [0.67, 1.0, 1.2]

nc: 3

names: [‘Forward_scissor’, ‘Reverse_scissor’,‘other’]

Hello! Thanks for the detailed post. It’s great to see the systematic approach you’re taking to solve this problem.

Your second method of creating an explicit other class for negative samples is a very effective strategy for reducing false positives. The issue of overfitting to the other class can often be traced back to class imbalance, so your decision to balance the number of samples across all classes is an excellent step.

To further improve your results, you might consider adjusting your data augmentation. The mosaic augmentation, which is enabled by default (mosaic=1.0), is highly effective for teaching the model about different object scales and contexts. This could be particularly helpful for the issues you’re seeing with long-distance detections. You can find more details on this and other augmentations in our Data Augmentation guide.

You also asked about emphasizing classification. You’re on the right track. You can adjust the weight of the classification loss by modifying the cls hyperparameter in your training command. Increasing its value (the default is 0.5) encourages the model to prioritize correct classification over bounding box precision.

Here’s a quick example of how you might adjust your training call:

from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO("yolov8n.pt")
    model.train(
        data=r"E:\Project_Gesture\model_script\scissors.yaml",
        epochs=200,
        batch=64,
        imgsz=640,
        close_mosaic=10,  # Good practice to disable mosaic late in training
        mosaic=1.0,       # Ensure mosaic is enabled for most of the training
        cls=0.75,         # Increase classification loss weight (default is 0.5)
        # ... other args
    )

You can find a full list of these training arguments and their descriptions in our Model Training documentation.

Keep up the great work, and let us know how it goes! The community’s strength lies in members like you who share their challenges and solutions.

First, modifying the model/training parameters is worth experimenting, however there are other “cheaper” (easier to test) methods you should explore first.

  1. Try training without augmenting your dataset. The YOLO training cycle includes augmentations during training, so it could be redundant and causing overfitting.
    • Additionally, a ±90° degree rotation is technically a different hand gesture :victory_hand:
  2. Unless you have a specific need to use YOLOv8n, try using a larger model, like YOLOv8s. Often times a slightly larger model can help considerably with classification performance.
    • Note: if using YOLOv8s causes and out of memory error, you can lower the batch size or you can freeze layers model.train(..., freeze=10) (freezes first 10 layers of model weights) which will reduce memory usage.
  3. Label the other gestures as what they truly are, ‘okay’, ‘palm’, ‘fist’, etc. instead of just “other”. During model training, it optimizes for the best filters that will separate the objects into their respective classes. Since you’re working with hand gestures, the model will likely develop filters that are very good at recognizing hands versus any other object, but not necessarily a specific hand gesture. Without giving the model any additional information about the possible other states a hand could be shown (other hand gestures), it is more likely to falsely classify an incorrect hand gesture. When you gave the “other” category, it overfit to that category, because there are many more variations of the “not scissors” hand gesture. You should try labeling the other hand gestures you expect the model to see, even when you don’t care about them, as it will help the model separate those instances from the one you want.
  4. It would take a bit more work, but you could also try to use a segmentation model instead. A segmentation model needs contour data, which might help the model better distinguish the gesture you want versus all others. That’s because embedded into the contour information is the shape of the object, and for a hand gesture like “scissors”, it might be enough. I assume you have bounding box data at the moment, which you can use SAM2 to help with converting bounding boxes into contour annotations.

Thank you for providing valuable suggestions. I will try your method in the future. I rechecked the dataset today. The data label were all correct, but I found that there were a few gestures from other angles. So I rotated the images and labels through a script to make the gesture angles as many as possible, and then adopted the method of a blogger to solve the misrecognition. His approach is to add negative samples in the Mosaic enhancement, which can ensure that there are all mischecked images in each round and each image. It can solve the problem of missed detection in fewer training rounds. If it succeeds in the future, I will share it with everyone. The general principle is as follows:
yolov8 supports direct training of “unlabeled negative samples” (referred to as “background” in v8), but by default, they are only mixed into positive samples for practice. This will result in new false image detectives not being fully trained. Therefore, after modifying the code, false image detectives will occur every time during training