YoloWorld Training On large Datasets

I’m training YOLO World on a custom pseudo-annotated dataset. To structure it, I created a dataset.yaml file similar to coco.yaml. However, since this is an open-vocabulary setup, Have more then 100K distinct class names listed under the names: field in the YAML. While training proceeds without issue, validation consistently crashes, likely due to the scale and structure of the class definitions.
I’m wondering if this approach is flawed or if there’s a better way to handle open-vocab datasets with large label sets.
Example:

dataset.yaml
path:
train:
val:
test:
names:
0: “Label 1“

1: “Label 2“

2: “Label 3“

.
.
.
109000: “Label last“

Yes, that appears to be correct for your dataset.yaml. A couple of questions that will help diagnose the issue you’re facing.

  1. Please share the training command you’re using (CLI or Python command), including all changes (non-default) to the training parameters/options.
  2. Let us know what the hardware you’re training on, GPU, CPU, and RAM
  3. Have you tried commenting out half (or any quantity) and test if the training faces the same issue? I would start at half, if the issue persists, I would test 500-1000 classes, just to see if I could get a training to proceed further (only training for 3-5 epochs).