How does YOLOv8 reduce false recognition? How does it reduce false recognition of similar objects

First, modifying the model/training parameters is worth experimenting, however there are other “cheaper” (easier to test) methods you should explore first.

  1. Try training without augmenting your dataset. The YOLO training cycle includes augmentations during training, so it could be redundant and causing overfitting.
    • Additionally, a ±90° degree rotation is technically a different hand gesture :victory_hand:
  2. Unless you have a specific need to use YOLOv8n, try using a larger model, like YOLOv8s. Often times a slightly larger model can help considerably with classification performance.
    • Note: if using YOLOv8s causes and out of memory error, you can lower the batch size or you can freeze layers model.train(..., freeze=10) (freezes first 10 layers of model weights) which will reduce memory usage.
  3. Label the other gestures as what they truly are, ‘okay’, ‘palm’, ‘fist’, etc. instead of just “other”. During model training, it optimizes for the best filters that will separate the objects into their respective classes. Since you’re working with hand gestures, the model will likely develop filters that are very good at recognizing hands versus any other object, but not necessarily a specific hand gesture. Without giving the model any additional information about the possible other states a hand could be shown (other hand gestures), it is more likely to falsely classify an incorrect hand gesture. When you gave the “other” category, it overfit to that category, because there are many more variations of the “not scissors” hand gesture. You should try labeling the other hand gestures you expect the model to see, even when you don’t care about them, as it will help the model separate those instances from the one you want.
  4. It would take a bit more work, but you could also try to use a segmentation model instead. A segmentation model needs contour data, which might help the model better distinguish the gesture you want versus all others. That’s because embedded into the contour information is the shape of the object, and for a hand gesture like “scissors”, it might be enough. I assume you have bounding box data at the moment, which you can use SAM2 to help with converting bounding boxes into contour annotations.