Custom model with subset of existing coco labels + some new labels

I’m trying to train a custom model with the following labels:

  1. Person (already in coco)
  2. Bird (already in coco)
  3. Squirrel (not in coco)

I was planning to use one of the squirrel datasets from roboflow universe in order to train on squirrels. I would use Yolov5 v.6.0, originally trained on coco.

I don’t see a way to train on a custom dataset of squirrels and tell it to also remember the other 2 labels. Is the default behavior of fine-tuning a model to forget all existing labels and only consider new training labels?

Here’s a few options I see - would appreciate any suggestions!

Option 1: fine-tune on squirrels + existing 2 coco categories

Extract the Birds and Person images + labels from the coco dataset, combine with the squirrels images + labels, and then retrain on the existing pre-trained weights. Based on this advice, I might reduce the number of Birds and Person images used from the original coco training set in order to balance out with the number of squirrel images.

Best option?

Option 2: fine-tune purely on custom datasets

Ignore existing coco dataset and instead find and use datasets in roboflow universe for Birds, Persons and Squirrels, and combine into a single custom dataset for training, and do fine-tuning on the existing pre-trained weights. This might have some advantages if I can find existing custom datasets for Birds and Person that match my use case better than the coco images.

Option 3: fine-tune on squirrel custom dataset, combine with existing other 80 coco labels

I think it’s possible train on both the original coco and the custom dataset (example), while keeping all the other 80 coco labels. However, wouldn’t this reduce the accuracy compared to a model that was only trained to detect 3 labels?

Option 4: retrain from scratch

Since I only care about these 3 labels, should I retrain coco from scratch using a subset of the coco labels + data and the new squirrel labels + custom data? This seems like it would take a long time, so it would be better to do a fine-tuning approach instead.

I’d say both option 1 and 2 would work well for you

Thanks for the tip @Ayush_Chaurasia. I tried option 2, but wasn’t able to get good results, most likely because the datasets were fairly low quality. I still haven’t tried option 1 yet.