It’s a very common problem/issue. Usually the best solutions are:
Use an open vocabulary model like YOLO-Worldv2, YOLOE, or even SAM2 (SAM3 coming soon) to generate annotations for common objects. See the examples in the docs, but essentially you can use text prompts, point prompts, or box prompts to help speed up annotations.
Use the “data flywheel” process. Basically this means you should label some data, train a model, then use that trained model to help annotate more data. Early on, you’ll need to do a lot of manual fixes, but as you begin to collect 100s or 1000s of images for each class, this will need less intervention. You can also save any of the YOLOE or YOLO-World models to use here without any training.
There are lots of open datasets with annotations available. You can check Kaggle, Hugging Face, the Google Dataset search, or other platforms, to find anything that includes the objects you’re looking to train a model on. Hyper specialized objects could be difficult to find, but you never know until you look.
Synthetic data could work in some cases, but honestly it might end up being more effort than it’s worth, especially for special object classes. For instance, I had a project that was detecting very specific micro defects in glass, and the effort to generate synthetic data here was more than just labeling the real data.
I used other techniques from traditional computer vision, but this was because (some of) the defects could have bounding boxes added using these methods, however I still had to manually apply the class labels. That was because there were no open vocab models available at the time I did that project, but I’d absolutely start with one of those using point prompts if doing it again today. I did also use the data flywheel technique to help speed up the annotation process.
If it’s relatively simple to generate synthetic data for your objects, then by all means give it a try. I would still recommend collecting more real data (include a larger quantity of real data than synthetic) to train against, as synthetic data alone is unlikely help a model generalize well (meaning the model can perform well on new, never before seen data).
Hello @Igor I would suggest going with collecting real data and doing manual labeling. Synthetic data can be used when real-time data isn’t available, but I’d still say it’s only suitable for short-term projects.
Great point, Joel—real, manually labeled data is still the gold standard. I’d treat synthetic as a strategic complement rather than only short‑term:
Use synthetic to pretrain or cover rare/unsafe edge cases, then fine‑tune on mostly real data (aim for real > synthetic, and keep a real‑only val/test set).
Spin the data flywheel: train a small YOLO11n/s, then auto‑label more images and fix quickly.
Minimal starter for pseudo‑labels: yolo predict model=yolo11s.pt source=unlabeled/ save_txt=True conf=0.25
If classes are generic, open‑vocabulary models (YOLOE/YOLO‑World) can jump‑start labels; then finalize with YOLO11.