I’ve been experimenting with YOLO11 on a Raspberry Pi with a camera, and it seems to be working well so far.
I’d now like to create my own custom dataset and retrain the existing model so that it includes all the original pre-trained data plus new objects I want the system to detect.
Could someone advise me on the following?
Where can I find good-quality, free, open-source datasets?
How should I prepare and label the data correctly?
How can I use this new data to retrain or update an existing YOLO model?
If there are any recommended guides, tutorials, or YouTube videos that walk through this process, it would be greatly appreciated.
Annotate new images with same labels in data YAML (use whatever labeling software you like most)
Split dataset into train/validation/test (at least train/validation, but test set is useful for “true” performance testing)
Train, validate, test
Annotate more as required to improve performance
There’s an updated guide about model training on the doc site, and this older guide, is still very relevant. Absolutely recommend reading it, as well as the articles linked in the guide. There is also this guide on data labeling that would also be a good read.
Download the full COCO dataset, add your new images + labels, train the model (don’t do this on the Raspberry Pi thought, just wanted ensure that was obvious). You’ll want a computer with a good CPU and a GPU with as much VRAM as possible. If you don’t own one, there are cloud based ones, but for large training you’ll likely need to pay to avoid disruptions. Training the COCO dataset for YOLO11n might take a day or so with a reasonably large GPU, could take longer with a smaller one and/or significantly more data/labels.