I’m training YOLO World on a custom pseudo-annotated dataset. To structure it, I created a dataset.yaml
file similar to coco.yaml
. However, since this is an open-vocabulary setup, Have more then 100K distinct class names listed under the names:
field in the YAML. While training proceeds without issue, validation consistently crashes, likely due to the scale and structure of the class definitions.
I’m wondering if this approach is flawed or if there’s a better way to handle open-vocab datasets with large label sets.
Example:
dataset.yaml
path:
train:
val:
test:
names:
0: “Label 1“
1: “Label 2“
2: “Label 3“
.
.
.
109000: “Label last“