Hi Ultralytics Community,
I hope you’re all doing well!
I am currently working on a project where the client aims to deploy 75 AI-powered cameras. These cameras will perform two tasks:
- Detecting intrusions inside a virtual line.
- Human pose estimation to identify throwing actions.
The cameras will be connected to a computational AI unit, and all 75 need to run simultaneously.
Could anyone please guide me on how to estimate the computational power required to handle this workload? Specifically, I would love to know:
- The type of hardware (e.g., GPUs, edge AI devices) you would recommend for this scale.
- Any software optimizations that might reduce computational demand.
- Examples of similar setups or benchmarks that could help me make an informed decision.
Your insights and expertise would be incredibly helpful, and I really appreciate your taking the time to help me with this challenge.
Thank you so much in advance!
Hi there!
Thanks for reaching out to the Ultralytics community with your exciting project! Deploying 75 AI-powered cameras with tasks like intrusion detection and human pose estimation is ambitious and impactful. Let’s break this down step-by-step to help you estimate the computational power and make informed decisions.
1. Recommended Hardware
- Edge AI Devices: For distributed processing, devices like NVIDIA Jetson series (e.g., Jetson Xavier NX, Orin Nano) are a great choice. These are well-suited for running lightweight AI models like YOLOv8 or YOLO11 and can handle real-time inference with optimized power consumption.
- Centralized GPU Servers: If you prefer a centralized setup, high-performance GPUs like NVIDIA A100 or RTX 4090 are excellent choices. They can process multiple streams simultaneously, but you’ll need to ensure sufficient bandwidth and low latency to handle 75 camera feeds.
- Hybrid Approach: A mix of edge and cloud/on-premise servers could also work. Edge devices can handle simpler tasks (e.g., intrusion detection), while pose estimation, which is more computationally intensive, could be offloaded to a powerful central server.
2. Software Optimizations
- Model Optimization: Use tools like NVIDIA TensorRT or ONNX Runtime to quantize and optimize models for inference, reducing computational demand. For example, YOLOv8 and YOLO11 models can be exported to TensorRT for efficient edge deployment (Export Guide).
- Batch Processing: Process camera feeds in batches if real-time processing is not critical for all streams simultaneously.
- Region of Interest (ROI): Limit processing to specific areas of the frame (e.g., around the virtual line or regions where throwing actions are likely to occur).
- Efficient Models: Use smaller, faster YOLO models like YOLOv8n or YOLO11n for detection, which are lightweight and optimized for edge devices.
3. Benchmarks and Scaling
- Benchmarking Tools: Use the
benchmark
mode in YOLO to profile models on your selected hardware (Benchmark Guide).
- Similar Setups: Projects like video analytics on NVIDIA Jetson devices are great examples. You can check out this blog for insights into deploying YOLO models on edge devices.
- Estimations: For 75 cameras, if each stream processes at ~30 FPS with an optimized YOLO11 model, you might require multiple edge devices or a centralized server with GPUs capable of handling 2,000+ FPS in aggregate.
Suggested Next Steps
- Start with a pilot setup: Deploy 1-2 cameras on your chosen hardware and measure inference times and resource usage.
- Optimize models and test different hardware configurations to find a balance between cost, performance, and scalability.
- Use Ultralytics HUB or similar tools to manage and monitor multiple deployments easily.
This is a challenging but rewarding project, and we’d love to hear how it progresses! Feel free to share updates or ask further questions.
Best of luck,
Ultralytics Team
- Compute estimates are extremely difficult to make for any given situation, even moreso when they are for someone else. Temper your expectations, as it’s highly unlikely anyone will have a reasonably accurate estimate.
- You may want to reach out to organizations like NVIDIA directly, as they will likely be able to help you determine your needs. With so many inputs for inference, there are two obvious options. The first is using edge devices for each device (or small cluster of devices) to ensure “close to the source” inferencing, but you’d still have to sort out how to do the additional processing. This would be the cheaper option to test since you could purchase one or two devices for testing. The second option would be to use an inference server, something like NVIDIA Triton. All input streams would flow back to the server for inference and be processed there; alternatively you could look to use cloud compute but that might impact your latency in a variable way.
- Lower frame rate, lower resolution, hardware decoding, export YOLO to the fastest format for a given device (TensorRT having the fastest inference speeds overall, but requires NVIDIA hardware), and model quantization (where compatible). Additionally some users will write their production code using C++ since it can be considerably faster but that’s a much bigger effort.
- Similar examples are unlikely to be publicly shared. I’m not aware of any benchmark that would closely match your expected config, but with some searching online, you might be able to find benchmarks for multi-stream inference, but again unlikely to be very informative for your particular situation.
There’s going to be a lot of research and testing you’ll need to do to have a good sense of what’s required for your setup. It really does have to be something you do, because you will know all the nuances, constraints, and details pertaining to the project; plus even if you were able to share everything it would be a huge amount of time for anyone to try to help and I think that’s an unfair expectation to have. Of course, sharing your findings and results will be helpful to the community, so please do!
Personally, I’d go with the edge devices. That makes things very modular and easy to test. It does come with it’s fair share of overhead, but to me it seems like a decent option going forward.