I have an unlabeled dataset for an oriented bounding box model with 11 labels. I’ve partitioned it into k pieces, each to be sent to a different annotator. I can distribute a zip file and follow the documented YAML structure.
I passed along instructions to create a dataset and upload the zip file. It works, except the labels aren’t included. If possible, I’d like to keep the label indices and colors consistent for all annotators. Is there a way to facilitate this?
Your ZIP is being split correctly, but Ultralytics Platform isn’t picking up your class list because the uploaded layout doesn’t match the “YOLO dataset structure” it expects (i.e. data.yaml at the ZIP root plus per-split images/ + labels/ folders). The structure shown in the Ultralytics Platform quickstart dataset example is what the importer is built around.
Try repackaging like this (and name the file data.yaml at the ZIP root):
If the dataset is truly unlabeled, you can omit the labels/ folders initially, but the names: in data.yaml should still be detected and populate the Classes tab once the YAML is found.
For OBB later, the Platform parses YOLO-format label .txt files during upload as described in the Platform dataset processing docs.
On colors: there’s currently no YAML field to set class colors on upload. Colors are edited in the UI by clicking the color swatch next to each class name in the Classes tab (see “Edit class colors” in the Datasets page docs).
If you want, share your ZIP (or just confirm whether pilot-obb.yaml is at the root and whether you have any labels/*.txt present), and I can tell you exactly why it’s falling back to that single auto-generated class.
It looks like the platform is parsing YAML incorrectly, specifically for the string ?. I verified this with a toy dataset of 1 image and data.yaml.
I do not get class labels with this:
train: train
names:
0: x
1: ?
But the following works – I finally got class labels. (@Toxite Interestingly, the platform seems happy to use even some-random-name.yaml.)
train: train
names:
0: x
1: '?' # it's fine if quoted
So, I think it’s just breaking on ?. And, unfortunately, the platform gives no feedback for errors, even for a nonsense data.yaml I tried. This was actually why I wasn’t getting class labels earlier. In my real dataset, I used a class label ? to mean unknown. Just quoting that fixed the problem.
Note that ? is a valid string literal in YAML 1.1 and 1.2. (Granted, YAML’s a nightmare of a format with a literal hundred-page spec. ruudvanasseldonk: the-yaml-document-from-hell.)
I am still unclear on the expected YAML structure. I could be misreading, but the docs /datasets/obb/#dataset-yaml-format and /platform/data/datasets/#preparing-your-dataset uses images/train, not train/images.
The OBB page also names the file dota.yaml and specifies a path. Is there a normative document describing the format that I should refer to?
Both works. Ultralytics currently uses images/train. train/images is legacy way for backwards compatibility with YOLOv5.
path is only required if your YAML is in a different location than the images and labels folders. If the YAML is alongside the images and labels folder, it’s unnecessary.