What is the best approach to detect suspicious people wearing helmets, caps, masks, and hoodies in security cameras?

I want to create a detector for people wearing motorcycle helmets, caps + masks, and hoodies. My question is: what is the best way to approach this? The scenario I want to implement this in involves security cameras, but I don’t have many images to annotate, and I want the highest accuracy possible. Would it be a good idea to create just one class for “suspicious individuals” and annotate all images of suspects under this single class? Or should I create four different classes (normal individuals, caps+masks, hoodies and motorcycles helmet) (even with a small number of images)?

Hi there! :blush:

Great question! Detecting individuals with specific attributes (like helmets, caps + masks, and hoodies) in a security camera setup can be a challenging yet rewarding task.

Approach 1: Single Class (“Suspicious Individuals”)

This simplifies the task and reduces the need for a large, annotated dataset. However, this approach might not provide granular insights into what makes an individual “suspicious.” For example, you won’t be able to differentiate between someone wearing a hoodie versus a motorcycle helmet. This trade-off might be acceptable if your primary goal is to flag such individuals without needing detailed classifications.

Approach 2: Multiple Classes (e.g., “Normal Individuals,” “Caps + Masks,” “Hoodies,” “Motorcycle Helmets”)

Using distinct classes allows you to collect more detailed and actionable insights, such as tracking specific behaviors or trends. However, with a smaller dataset, this approach may lead to imbalanced classes and reduced accuracy for underrepresented categories.

Recommended Strategy for Your Case

Since you want high accuracy and have limited images, here’s a hybrid suggestion:

  1. Start with Fine-Grained Classes: Create separate classes for “Normal Individuals,” “Caps + Masks,” “Hoodies,” and “Motorcycle Helmets.” This will allow your model to learn nuanced differences. If you later find the dataset too limited, you can always merge the classes into a single “Suspicious Individuals” class to simplify the task.
  2. Data Augmentation: To address your limited dataset, apply techniques like flipping, cropping, rotation, and color adjustments. This will artificially expand your dataset and improve model generalization. Learn more about data collection and annotation strategies here.
  3. Use Pretrained Models: Leverage a pretrained YOLO model (like YOLOv8 or YOLO11) and fine-tune it on your dataset. Pretrained models on datasets like COCO already have a strong understanding of common objects, which can significantly boost performance even with limited data.
  4. Active Learning: Use your initial model to predict on new, unlabeled video frames, and iteratively annotate the most relevant or incorrectly classified samples to improve accuracy.

Tools and Resources

  • Annotation Tools: Use tools like Label Studio or CVAT for efficient labeling.
  • YOLOv8 or YOLO11: These models are well-suited for object detection and tracking. You can load a YOLO model, fine-tune it, and use the tracking mode for analyzing security camera footage. Learn more about YOLO11 here.
  • Object Tracking: Use the track mode to follow individuals across camera frames.

Example Code

Here’s an example of how to start training with YOLOv8 or YOLO11:

from ultralytics import YOLO

# Load a pretrained model
model = YOLO('yolov8n.pt')  # Use 'yolo11n.pt' for YOLO11

# Train on your custom dataset
model.train(data='custom_data.yaml', epochs=50, imgsz=640, batch=16)

# Predict on new images or videos
results = model.predict(source='security_camera_footage.mp4', conf=0.5)

# Save predictions
results.save()

Final Thoughts

If you’re constrained by data, consider starting with a smaller scope (e.g., just hoodies or helmets) and expand as you gather more samples. This iterative approach allows you to balance accuracy and dataset size effectively. Lastly, engage with the community or raise any further questions on the Ultralytics Discord for additional support!

Best of luck with your project—it sounds like an exciting and impactful security solution! :rocket:

It’s important to understand that the concept of “suspicious person” is highly subjective and is not something computer vision will do a good job of identifying. Think of computer vision models more like a tool for observation of objective presence, as the models are only capable of identifying objects and their locations (in the image). Someone on a motorcycle wearing a helmet isn’t necessarily ‘suspicious’ and would only be considered so depending on specific context, but a model won’t have the context. Reorient your thinking to align with this principle, as expecting otherwise will only lead to disappointment.

If you don’t have data (annotated images) you should start with finding or collecting them, as this will be the most important factor for accomplishing you goal. You can find datasets that may contain images with some annotations that are publicly available online (use your favorite search engine) and take a look at our datasets docs page. Check to make sure the license for any datasets you find, is accommodating to your project. This is to ensure you’re not violating the license terms for your use case. Most open datasets will likely need some additional work by you to better align with your project, so plan to spend some effort on updating, fixing, and/or adding annotations. Ultimately you should collect and annotate images from wherever you plan to use the trained model, as this will be instrumental in evaluating how well your model performs and if you need additional training or data.

Additionally, you will need to label objects with different classes when they are visually distinct. You should annotate with a person class and then with each of the objects you wish to detect. The items you listed, hat, full-faced helmet, hoodie, and mask, are all visually distinct items, so they should all be annotated as different classes. Including the person class will help to distinguish when an object like helmet is detected when it’s on its own, as without a person, it’s likely to be considered a false positive.

1 Like