Running Ultralytics YOLO on Raspberry Pi with Hailo 8L

Hi,

I am new to Ultralytics.

I am working with a Raspberry Pi together with the Hailo 8L module.

I see that Ultralytics mainly runs natively with Python. Am I correct in understanding that I can run a script to run the model directly on the Raspberry Pi?

Is there a way to create a model from the Ultralytics framework/environment in HEF format for the Hailo 8L, or is this something that needs to be done using the Hailo toolkit? I assume I need their toolkit, but can I still run the model natively with Python on the Raspberry Pi as a quick way to test it?

You can definitely run directly on the RPi, but it will be a bit slow. I recommend checking out this page from the docs:

which should help get you started. As for the Hailo 8L module, there is no way as of yet to generate the Hailo compatiable format. You will have to use the Hailo tools directly and reference their documentation (this one might be a good start). Additionally, if you haven’t yet checked out the integration with the Sony IMX500 camera, it might be worth taking a look at that as well.

@BurhanQ

Our RPI also has the Hailo Module, but before working on converting i guess i can run it natively on RP

Yes YOLO can run directly on the Raspberry Pi. IIRC, the ncnn models have reasonably good inference times when running on a RPi

@BurhanQ Is there any advantage to using the Hailo Module with Yolo?

Also, if i want to train my own model, with custome datasets, can i do this on my PC, RP, Mac M1 or on cloud? What would be the best way?

Using the Hailo module should give you much better inference times when running YOLO models. I haven’t tested it personally, but from the Hailo performance data, it should help reduce inference latency quite a bit!

With respect to custom training a model. I would advise against doing it on the RPi, they’re not built for it and it would be extremely slow. I like to use my own hardware for training, so a PC with a PyTorch compatible GPU or a M-series Mac with a good amount of memory. The more GPU vRAM you have, the faster you can get through training a model. That said, you can still do training with smaller GPUs, it will just take longer; I trained on the VisDrone dataset using a 6 GB GPU and iirc it took over an hour for one epoch to complete. If you don’t want to wait or don’t have the hardware, you can use cloud GPU services to train a model, but that’s 100% up to you.

@BurhanQ

Great, I will use my PC (GTX 1660 Super)or Macbook with an M(8 GB)1, would that be sufficient? Worst case i can leave it over night. Unless there is a cloud platform I could use?

For a Yolo mode, should the cloud be Windows or Unix, what is recommended?

You can try with the GTX 1660, but people have had some issues with model training on those GPUs. The reason is due to PyTorch’s implementation of automatic mixed precision (AMP). This issue used to be reported quite often, but I haven’t seen reports of it recently, so either use of those GPUs has significantly decreased or the issue has been resolved (unclear). Since the GTX 1660 Super has 6 GB of vRAM, you should configure a batch size of 1, use the argument batch=1 when training, and use the smallest image size dimensions you can, imgsz=480 (as an example). I would also recommend trying to use freeze=10 which will freeze the first 10 layers of the model for training, as it can help reduce the vRAM needed; although it will depend a lot on your dataset as to how well the model will learn.

I haven’t done any model training on a MacBook personally but I know many people have (including Ultralytics founder Glenn). Since the M1 MacBooks use unified memory, meaning the GPU and CPU share memory, 8 GB might be a bit low. If you close all other applications and only run training, it might be sufficient, but it’s not obvious to me if it will or not so you might need to experiment.

If you wanted to try cloud training your models, there are lots of options. First I have to call out https://hub.ultralytics.com as an obvious option. You would upload your dataset and can train using cloud GPUs (prices and GPU selection options are shown before starting training). Google Colab is another popular option, as they do have a free tier, but it does come with a daily time limit and inactivity timeout, which can interrupt your training and some users have reported difficulty recovering their (partially) trained model. I haven’t use Colab personally, so I can’t speak to the nuances very strongly. There are also many other cloud compute providers out there, which you can select a GPU and pay per use (usually per hour).

Using Windows for training is definitely possible, and I have done it many times. Just like my recommendation for training on your MacBook, and really any OS on consumer hardware, closing all applications and avoiding use of that computer while training is highly recommended. I will say that I’ve found that training on Linux has been a bit fast with the same hardware, just because Python seems a bit better optimized for Unix based OSes. That said, it probably won’t make a major difference for you, so if you already have a Windows computer for your GTX 1660 Super, it would be worth training on that to see how it goes. Be certain to install Python on Windows using the installer from the Python.org website (or using uv) and NOT the Windows Store (it causes lots of issues). Additionally, review the Getting Started page in the docs for installation help
Install Ultralytics - Ultralytics YOLO Docs
as it will be critical to ensure you get the proper versions of torch and torchvision installed with CUDA support; you’ll need to check what version of CUDA is supported on your GPU. By default Pytorch won’t install with CUDA enabled and if so, the training will only run on CPU and will be extremely slow. Installation on MacOS and Linux should include the correct libraries for using the GPU automatically. If you have Docker on any of your machines, you can also use an Ultralytics Docker container.