Standard epochs and imgsz for training YOLO11/YOLOv12 on VisDrone dataset

SwarnaReddy · November 3, 2025, 11:07am

I’m training YOLO11 and YOLOv12 models on the VisDrone dataset for small object detection.

I know that for COCO dataset, the standards are:

Epochs: 600
imgsz: 640

What are the recommended standard values for:

Epochs
Image size (imgsz)

when training on VisDrone dataset? Are there any official recommendations or community best practices?

Environment:

Dataset: VisDrone-2019
Models: YOLO11, YOLOv12
Hardware: DGX A100 GPU

BurhanQ · November 3, 2025, 12:22pm

AFAIK, there aren’t any “standards” for training on the VisDrone dataset. There’s not pretrained model, so you’ll have to take your best guess. You could check the original publication
https://arxiv.org/pdf/2001.06303
however from a quick glance, I didn’t see any mentions of the training arguments. If you’re using a larger model (large or xtra-large), you’ll probably need to train for a long time. You could start with 600/800 epochs and set a early stop of 50-100 epochs. For resolution, it might depend on several factors, but given you have an A100 to train on, maybe start with 1280 and then try to adjust for the optimal batch size. It might be worthwhile to try a few small training runs (20-30 epochs) to do a quick check for batch sizing. I might also consider testing for various image sizes. A 1280 resolution could work, but there could be just as good performance at 960 or 800, so doing a few quick tests with different image sizes could be useful. These are quick checks and may not provide an accurate answer, but it should give you somewhere reasonable to start from.

SwarnaReddy · November 4, 2025, 4:39am

Thank you @BurhanQ for sharing this research paper and the detailed information!
This is very helpful for my VisDrone training setup.

I’ll review the paper and will implement the recommendations for imgsz configuration & epoch settings
If I encounter any queries or need further clarification while implementing these recommendations, I’ll ask you. Thank you for the guidance!

pderrenger · November 5, 2025, 12:40am

Sounds great! Quick, proven baseline for VisDrone small objects:

yolo detect train data=VisDrone.yaml model=yolo11s.pt imgsz=1280 epochs=600 batch=-1 patience=100 multi_scale=True

There aren’t official “standards,” but on A100, imgsz 960–1280 with early stopping (patience 80–120) works well; do a 20–30 epoch sanity run to pick imgsz/batch. Prefer YOLO11 for stability and speed—YOLO12 is not recommended due to training instability and heavy attention layers.

If helpful, the setup steps in the VisDrone dataset guide and the train settings reference are here: see the Ultralytics docs for the VisDrone dataset and the train settings page. Share your results.png if you want tuning suggestions.

SwarnaReddy · November 7, 2025, 6:10pm

Thank you @pderrenger for the detailed guidance! Yes, I conducted a sanity run and confirmed 1280 resolution performs best on my A100 setup for VisDrone small object detection.

Regarding the image size for paper reporting: I’ve noticed the inconsistency in research papers—some use 640×640, others use higher resolutions (960-1280), and many don’t explicitly report training resolution at all. This creates a challenge for fair comparison.

BurhanQ · November 8, 2025, 1:29pm

The inconsistency is unfortunate, but for practical purposes, generally you’ll have to determine the proper resolution for your use. Smaller object trend to mean higher resolutions, although there is a limit. If you can successfully detect objects at 1280, you can stay with that, or you can test lowering the resolution to find where the drop-off point is, where the model is unable to detect objects. The reason to do this is that lower resolutions generally tend towards faster inference speed. If inference speed is not a concern, then you don’t have to test lower resolutions, but it could be worth trying as an experiment (in my experience, speed becomes a concern eventually).

SwarnaReddy · November 9, 2025, 6:01am

@BurhanQ Thank you for that clarification on the speed-accuracy tradeoff! I’ve now had a chance to run the resolution sweep you suggested, and the results are quite significant: When I reduced training resolution from 1280 to 640, I observed a -9 percentage point drop in mAP@50

Is -9 mAP typical for VisDrone when going from 1280 → 640?

I need suggestions for customizing the model for better mAP!!

BurhanQ · November 10, 2025, 2:09pm

Given the size of the objects are quite small in the VisDrone dataset, that’s not terribly surprising. You might try using imgsz=800 or imgsz=960 (really any multiple of 32) to see if the mAP drop is small enough to be acceptable.

Topic		Replies	Views
Improvement of classification accuracy in yolov11 Discussion support	1	610	May 1, 2025
Yolo 12 x and l not finishing training Support question , support	1	316	April 8, 2025
Ultralytics YOLO11 Released 🎉 Discussion showcase , ultralytics-official	1	842	October 1, 2024
Poor training results after tuning yolo11 Discussion yolo , troubleshooting	1	140	October 9, 2025
Normal then slow then crashing training YOLO yolo , question	6	619	July 9, 2025

Standard epochs and imgsz for training YOLO11/YOLOv12 on VisDrone dataset

Related topics