Improving results of poor detection

I need some help from the YOLO community. I have worked on a couple of YOLO projects previously that all worked quite well but this project is showing quite poor results.

Our goal is to detect “Cocos” on Brazil nut trees (Castañas) from drone images. Cocos are large coconut like fruits on Brazil nut trees that hold all the Brazil nuts. We hope to connect the amount of detections to harvest data taken from the previous 7 years for individual trees.

The dataset is at here.

When roboflow trained the model it gave me these results:
mAP 7.1% Precision 12.8% Recall 1.8%

I have a lot of questions about the best approach for detection of these coco’s. All of our images are taken from the same elevation so all of the detections will be around the same size.

The coco’s show in the drone images in many different ways from full visible coco’s to partially hidden coco’s by leaves to coco’s a little deeper in the tree canopy that show as a little darker but still noticeable to the human eye as well as some around similar looking tree trucks and bark. Here are a few examples of annotations and an overall view of all of the cocos on one drome image:

We don’t necessarily need an accurate count of all visible coco’s exactly but we do need to be able to make a relationship between YOLO coco detection and harvest volume per tree.

I have generated a lot of questions so I hope others in the community will find this helpful as well.

What do you think is the best way to annotate the coco’s to get a reliable result?

Can we remove partially covered coco’s from the annotations and try to only detect only full visible coco’s or will that confuse the model in some way?

Does the fact that each annotation has very few pixels in relation to the image size make the detection of coco’s in these images difficult or unlikely?

Should I focus on including partial objects (e.g., partially visible cocos) in my annotations, or only fully visible ones?

Should I leave some room between the edges of the coco’s and the annotations or should the annotations fit very tightly around the coco’s without touching the edges of them?

What is the best practice for detection such small objects in an image?

How to best deal with small clusters of coco’s?

Would using segmentation masks instead of bounding boxes improve detection for small objects like coco’s?

Should I include a significant number of negative samples (images without coco’s) in my dataset?

What augmentations are most effective for small object detection of coco’s? (I assume we should not use any color changes for example)

Does the dataset require a minimum number of examples per object class for effective detection? (Do we simply need more examples annotated)

Is it better to downscale or crop images for training?

Are there specific hyperparameters (e.g., anchor box sizes) that should be tuned for small object detection?

How should I adjust the learning rate and batch size for a dataset with many small objects?

Would a larger model variant (e.g., YOLOv11x) perform better for detecting small objects compared to smaller ones (e.g., YOLOv11s)?

Are there specific YOLO configurations optimized for small object detection?

Is my dataset’s Ground Sampling Distance (2.1 cm), the resolution or size of each pixel, appropriate for detecting small objects like cocos?

Any help and guidance you can provide would be greatly appreciated!

Hi there! Thanks for sharing detailed context about your project and dataset—this helps a lot in diagnosing the challenges and suggesting improvements. Detecting the “Cocos” on Brazil nut trees is an interesting task, and while small object detection can be tricky, there are several strategies you can use to improve your results.

Annotations and Dataset Preparation

  1. Annotation Strategy:

    • For Partial Objects: Include partially visible “Cocos” in your annotations if they are detectable by humans in the drone imagery. Excluding them might confuse the model into misclassifying similar regions that partially appear. Models learn better when trained on a variety of real-world scenarios.
    • Tight Bounding Boxes: Ensure bounding boxes are as tight as possible around the object edges. Loose annotations introduce noise and can degrade precision. Check out training mosaics (train_batch*.jpg) to verify label correctness.
    • Negative Samples: Yes, include negative samples (images without “Cocos”). These images help the model learn what is not a “Coco,” reducing false positives.
    • Clustered Annotations: In cases of tightly clustered “Cocos,” annotate each object individually. Clusters can pose difficulties, but clear individual annotations will help generalization.
  2. Dataset Size:
    Detection models benefit from sufficient examples per class. A general rule is ≥ 1500 images per class and ≥ 10,000 labeled objects source. If possible, expand your dataset with more annotated images.

  3. Image Resolution:
    Your Ground Sampling Distance (2.1 cm) sounds appropriate for drone imagery, but ensure your images are sufficiently high-resolution that “Cocos” occupy at least ~32x32 pixels. If they’re smaller, tiling/cropping the images into smaller segments containing objects may help.

Training Workflow

  1. Segmentation vs. Box Detection:
    If “Cocos” have irregular shapes and bounding boxes don’t work well, experimenting with instance segmentation models (e.g., YOLO11 with segmentation, yolo11n-seg.pt) could enhance detection accuracy.

  2. Augmentation for Small Objects:

    • Use Mosaic to improve context by mixing multiple images.
    • Random rotation, scaling, and flipping are especially useful for drone imagery.
    • Avoid heavy color augmentations unless lighting discrepancies are part of deployment conditions.
    • Syntax: Apply these automatically using the augmentation hyperparameters in YOLO training pipelines.

Hyperparameters and Model Configuration

  1. Model Size:
    Larger models (e.g., YOLO11x) usually perform better for small object detection due to their increased capacity. However, ensure your hardware can handle the larger model with adequate batch size to maintain batchnorm performance.

  2. Anchor Tuning:
    Customize anchor box sizes to fit the typical widths/heights of the “Cocos.” You can automate anchor tuning during training by using the --autoanchor argument (enabled by default).

  3. Learning Rate and Batch Size:
    Start with default hyperparameters but experiment with smaller batch sizes and slightly reduced learning rates (--lr) if observing overfitting. A smaller batch size ensures gradients stabilize for datasets with intricate annotations.

Small Object Detection Tips

  1. Tiling for Small Objects:
    If “Cocos” are small relative to the image, split each large image into tiles and adjust annotations accordingly. Tiling retains the resolution and makes small objects more apparent to the model source.

  2. Evaluation:
    Monitor key metrics like Precision, Recall, and mAP@.5 during validation to check overfitting or underfitting. Also, review results using confusion matrices and visualizations.

Resources and Suggested Next Steps

Here’s how to implement these ideas:

Finally, detect-specific questions like cropping vs. tiling, augmentations, learning rate adjustments, and segmentation workflows will benefit from active experimentation. Feel free to share your updated results and we’ll explore further optimizations with you! :rocket:

Good luck with your project! :star2:

Are you using SAHI? Otherwise, if they’re so small, the model wouldn’t be able to detect them.

This is very helpful thank you.

1 Like

I am reading about SAHI now and planning to implement it. Thank you.