I am very new to Ultralytics and YOLO. I am aware there are many forks in the Ultralytics GitHub library. Has anyone worked on a model that can detect animals such as cats, foxes, and dogs in urban settings, for example, a garden? I believe there are multiple forks in the github, so I was wondering if someone has started such a model?
I am looking for a starting point, as the standard model does not always seem to detect every variation.
But does the model need to have been trained on the original object first? For example, if I search for a fox but ‘fox’ wasn’t included in the model’s training data, then it won’t find it—even with prompts, right?
The YOLOE model uses text prompts like the words “fox” and/or “vulpes” to allow the model to detect objects that have activations that are similar to the vectorization of these words. It’s a bit complex if you’re new to the ideas of machine learning and computer vision, but I would recommend checking out the docs page on the YOLOE model for more details.
You can also just try it out with some example images to see if it works as you expect. The ultralytics library should be fairly simple to use for testing your use case, and the documentation pages have a lot of walk thrus and info to help you out. It’s very common that when questions are asked like, “can YOLO do X” or “will this work for Y” that the answer will be, “you have to test it out” b/c it’s not common that someone else will have experience with your exact use case or situation. If you get stuck or run into issues, feel free to ask here or any of our other communities.
Can I try with a video stream from my Raspberry Pi?
So by typing a prompt, does this model sort of self-train itself, or would these prompts we type be somehow pre-trained in the model? For example, if we were to specify an object that has never been trained, how would it work?
YOLOE is an open-vocabulary model. It was designed and trained to detect objects based on prompts.
You will have to read about open-vocabulary models to understand how they work. There’s no “self-training”. You’re thinking about traditional closed set models which are trained on specific classes. Open-vocabulary models are designed to be able to detect classes they weren’t specifically trained on.
If you want to understand how they work, you will have to read the YOLOE paper.
I think you should attempt running the model first and then ask questions when you are having issues, instead of asking questions that could have been answered if you had just attempted it. Also a lot of these general questions can be answered through a simple Google research or by reading Ultralytics Docs.