YoloWorld Training On large Datasets

Annas_Shaikh · August 6, 2025, 7:07am

I’m training YOLO World on a custom pseudo-annotated dataset. To structure it, I created a dataset.yaml file similar to coco.yaml. However, since this is an open-vocabulary setup, Have more then 100K distinct class names listed under the names: field in the YAML. While training proceeds without issue, validation consistently crashes, likely due to the scale and structure of the class definitions.
I’m wondering if this approach is flawed or if there’s a better way to handle open-vocab datasets with large label sets.
Example:

dataset.yaml
path:
train:
val:
test:
names:
0: “Label 1“

1: “Label 2“

2: “Label 3“

.
.
.
109000: “Label last“

BurhanQ · August 6, 2025, 10:46am

Yes, that appears to be correct for your dataset.yaml. A couple of questions that will help diagnose the issue you’re facing.

Please share the training command you’re using (CLI or Python command), including all changes (non-default) to the training parameters/options.
Let us know what the hardware you’re training on, GPU, CPU, and RAM
Have you tried commenting out half (or any quantity) and test if the training faces the same issue? I would start at half, if the issue persists, I would test 500-1000 classes, just to see if I could get a training to proceed further (only training for 3-5 epochs).

Topic		Replies	Views
Training custom dataset Discussion question , discussion	7	428	November 4, 2024
I am trying to make Yolov8 model 1622 class object detection YOLO support	8	260	September 3, 2024
YOLO 12 Id'ing images not on my list Support yolo , question , support	15	153	April 1, 2025
About Yolo Configuration File (YAML) Discussion discussion	2	416	September 30, 2024
I Need Help with YOLOv5 Training for Custom Object Detection YOLO yolov5 , support	1	391	July 24, 2024

YoloWorld Training On large Datasets

Related topics