YOLO model for detecting cats, foxes, and dogs in urban gardens

dj3000 · September 10, 2025, 7:44pm

Hi All,

I am very new to Ultralytics and YOLO. I am aware there are many forks in the Ultralytics GitHub library. Has anyone worked on a model that can detect animals such as cats, foxes, and dogs in urban settings, for example, a garden? I believe there are multiple forks in the github, so I was wondering if someone has started such a model?

I am looking for a starting point, as the standard model does not always seem to detect every variation.

Toxite · September 11, 2025, 4:55am

You can try YOLOE with prompts:

https://docs.ultralytics.com/models/yoloe/#__tabbed_2_1

dj3000 · September 11, 2025, 8:32am

But the prompts depend on the model first detecting the object?

Toxite · September 11, 2025, 9:44am

If you mean that it detects all objects first and then filters based on prompt, then no. It detects based on the prompt.

dj3000 · September 11, 2025, 9:57am

But does the model need to have been trained on the original object first? For example, if I search for a fox but ‘fox’ wasn’t included in the model’s training data, then it won’t find it—even with prompts, right?

BurhanQ · September 11, 2025, 12:13pm

The YOLOE model uses text prompts like the words “fox” and/or “vulpes” to allow the model to detect objects that have activations that are similar to the vectorization of these words. It’s a bit complex if you’re new to the ideas of machine learning and computer vision, but I would recommend checking out the docs page on the YOLOE model for more details.

You can also just try it out with some example images to see if it works as you expect. The ultralytics library should be fairly simple to use for testing your use case, and the documentation pages have a lot of walk thrus and info to help you out. It’s very common that when questions are asked like, “can YOLO do X” or “will this work for Y” that the answer will be, “you have to test it out” b/c it’s not common that someone else will have experience with your exact use case or situation. If you get stuck or run into issues, feel free to ask here or any of our other communities.

dj3000 · September 11, 2025, 1:08pm

@BurhanQ Thank you for your reply

I will read this document.

Can I try with a video stream from my Raspberry Pi?

So by typing a prompt, does this model sort of self-train itself, or would these prompts we type be somehow pre-trained in the model? For example, if we were to specify an object that has never been trained, how would it work?

Toxite · September 11, 2025, 3:52pm

YOLOE is an open-vocabulary model. It was designed and trained to detect objects based on prompts.

You will have to read about open-vocabulary models to understand how they work. There’s no “self-training”. You’re thinking about traditional closed set models which are trained on specific classes. Open-vocabulary models are designed to be able to detect classes they weren’t specifically trained on.

If you want to understand how they work, you will have to read the YOLOE paper.

dj3000 · September 11, 2025, 7:37pm

@Toxite Thanks I i will read up on it, but there must be some limits on what can be detected with the prompts?

Toxite · September 12, 2025, 12:22am

There are obviously limits, especially for entirely different domains than what the pretraining was for. But it’s not as limited as closed set models.

dj3000 · September 14, 2025, 6:20pm

@Toxite @BurhanQ When running YOLE on RP, do we natively run the .pt file or do we need to convert it to another format, which we then use?

Also, I have the following camera connected to my RP

Waveshare IMX290-83 IR-Cut Camera Compatible with Raspberry Pi Board and Module Series Using a IMX290 Starlight Camera Sensor with 2MP onboard IR-CUT Switch the Modes Between Daytime and Nighttime: Amazon.co.uk: Computers & Accessories

In my other project, i refer to the camera using input="libcamera", that compatible with the Ultralytics Python libraries?

Toxite · September 14, 2025, 6:33pm

You can both run .pt or convert it to another format.

From the search online, you can’t directly fetch frames from that camera using OpenCV.

You need to manually get the frames from that and run model.predict()on it to make inference.

Toxite · September 14, 2025, 6:39pm

I think you should attempt running the model first and then ask questions when you are having issues, instead of asking questions that could have been answered if you had just attempted it. Also a lot of these general questions can be answered through a simple Google research or by reading Ultralytics Docs.

Topic		Replies	Views
YOLO-World Open Vocabulary test using Ultralytics ❤️🎉 Discussion code , ultralytics-official	1	207	September 16, 2024
Yolo ubuntu YOLO question , support	2	79	January 20, 2025
YOLO 12 Id'ing images not on my list Support yolo , question , support	15	475	April 1, 2025
:fire: YOLO26 Available Now News yolo , announcements , ultralytics-official	1	214	January 14, 2026
New Release: Ultralytics v8.3.99 Discussion releases , announcements , ultralytics-official	0	85	March 30, 2025

YOLO model for detecting cats, foxes, and dogs in urban gardens

Related topics