Estimating Computational Power for 75 AI-Powered Cameras

Fahim_Anwar · January 1, 2025, 6:13am

Hi Ultralytics Community,

I hope you’re all doing well!

I am currently working on a project where the client aims to deploy 75 AI-powered cameras. These cameras will perform two tasks:

Detecting intrusions inside a virtual line.
Human pose estimation to identify throwing actions.

The cameras will be connected to a computational AI unit, and all 75 need to run simultaneously.

Could anyone please guide me on how to estimate the computational power required to handle this workload? Specifically, I would love to know:

The type of hardware (e.g., GPUs, edge AI devices) you would recommend for this scale.
Any software optimizations that might reduce computational demand.
Examples of similar setups or benchmarks that could help me make an informed decision.

Your insights and expertise would be incredibly helpful, and I really appreciate your taking the time to help me with this challenge.

Thank you so much in advance!

pderrenger · January 1, 2025, 1:25pm

Hi there!

Thanks for reaching out to the Ultralytics community with your exciting project! Deploying 75 AI-powered cameras with tasks like intrusion detection and human pose estimation is ambitious and impactful. Let’s break this down step-by-step to help you estimate the computational power and make informed decisions.

1. Recommended Hardware

Edge AI Devices: For distributed processing, devices like NVIDIA Jetson series (e.g., Jetson Xavier NX, Orin Nano) are a great choice. These are well-suited for running lightweight AI models like YOLOv8 or YOLO11 and can handle real-time inference with optimized power consumption.
Centralized GPU Servers: If you prefer a centralized setup, high-performance GPUs like NVIDIA A100 or RTX 4090 are excellent choices. They can process multiple streams simultaneously, but you’ll need to ensure sufficient bandwidth and low latency to handle 75 camera feeds.
Hybrid Approach: A mix of edge and cloud/on-premise servers could also work. Edge devices can handle simpler tasks (e.g., intrusion detection), while pose estimation, which is more computationally intensive, could be offloaded to a powerful central server.

2. Software Optimizations

Model Optimization: Use tools like NVIDIA TensorRT or ONNX Runtime to quantize and optimize models for inference, reducing computational demand. For example, YOLOv8 and YOLO11 models can be exported to TensorRT for efficient edge deployment (Export Guide).
Batch Processing: Process camera feeds in batches if real-time processing is not critical for all streams simultaneously.
Region of Interest (ROI): Limit processing to specific areas of the frame (e.g., around the virtual line or regions where throwing actions are likely to occur).
Efficient Models: Use smaller, faster YOLO models like YOLOv8n or YOLO11n for detection, which are lightweight and optimized for edge devices.

3. Benchmarks and Scaling

Benchmarking Tools: Use the benchmark mode in YOLO to profile models on your selected hardware (Benchmark Guide).
Similar Setups: Projects like video analytics on NVIDIA Jetson devices are great examples. You can check out this blog for insights into deploying YOLO models on edge devices.
Estimations: For 75 cameras, if each stream processes at ~30 FPS with an optimized YOLO11 model, you might require multiple edge devices or a centralized server with GPUs capable of handling 2,000+ FPS in aggregate.

Suggested Next Steps

Start with a pilot setup: Deploy 1-2 cameras on your chosen hardware and measure inference times and resource usage.
Optimize models and test different hardware configurations to find a balance between cost, performance, and scalability.
Use Ultralytics HUB or similar tools to manage and monitor multiple deployments easily.

This is a challenging but rewarding project, and we’d love to hear how it progresses! Feel free to share updates or ask further questions.

Best of luck,
Ultralytics Team

BurhanQ · January 1, 2025, 1:49pm

Compute estimates are extremely difficult to make for any given situation, even moreso when they are for someone else. Temper your expectations, as it’s highly unlikely anyone will have a reasonably accurate estimate.
You may want to reach out to organizations like NVIDIA directly, as they will likely be able to help you determine your needs. With so many inputs for inference, there are two obvious options. The first is using edge devices for each device (or small cluster of devices) to ensure “close to the source” inferencing, but you’d still have to sort out how to do the additional processing. This would be the cheaper option to test since you could purchase one or two devices for testing. The second option would be to use an inference server, something like NVIDIA Triton. All input streams would flow back to the server for inference and be processed there; alternatively you could look to use cloud compute but that might impact your latency in a variable way.
Lower frame rate, lower resolution, hardware decoding, export YOLO to the fastest format for a given device (TensorRT having the fastest inference speeds overall, but requires NVIDIA hardware), and model quantization (where compatible). Additionally some users will write their production code using C++ since it can be considerably faster but that’s a much bigger effort.
Similar examples are unlikely to be publicly shared. I’m not aware of any benchmark that would closely match your expected config, but with some searching online, you might be able to find benchmarks for multi-stream inference, but again unlikely to be very informative for your particular situation.

There’s going to be a lot of research and testing you’ll need to do to have a good sense of what’s required for your setup. It really does have to be something you do, because you will know all the nuances, constraints, and details pertaining to the project; plus even if you were able to share everything it would be a huge amount of time for anyone to try to help and I think that’s an unfair expectation to have. Of course, sharing your findings and results will be helpful to the community, so please do!

Personally, I’d go with the edge devices. That makes things very modular and easy to test. It does come with it’s fair share of overhead, but to me it seems like a decent option going forward.

Muavia-Shakeel · May 10, 2025, 5:27am

which GPU is best A4000 or RTX 5090. And how many cameras they can handle if we run stream on 2MP?

pderrenger · May 11, 2025, 12:41am

Hi Muavia-Shakeel,

Choosing between GPUs like the NVIDIA A4000 and an unreleased card such as a hypothetical RTX 5090 depends on availability, budget, and specific performance needs. Newer generation GPUs generally offer improved performance.

The number of 2MP camera streams a single GPU can handle for tasks like intrusion detection and pose estimation isn’t a fixed number. It’s influenced by several factors:

The complexity of the specific YOLO models used (e.g., YOLOv8n vs. YOLOv8x).
The desired inference speed (frames per second) for each camera.
Whether both detection and pose estimation tasks run simultaneously on all streams.
The use of optimizations such as NVIDIA TensorRT, which can significantly boost throughput.

For a deployment of 75 cameras, you’ll likely require a robust solution, potentially involving one or more high-end server GPUs, or a distributed system using multiple edge AI devices like NVIDIA Jetson. We recommend benchmarking your specific pipeline with your chosen models and settings on candidate hardware to accurately determine the required computational power. Tools like NVIDIA’s DeepStream can also help manage and optimize multi-stream AI applications.

BurhanQ · May 12, 2025, 11:30am

There are numerous factors and without either rigorously testing or working directly with an integration partner, it will be next to impossible to provide an accurate answer. For a large number of cameras, to avoid performance bottlenecks, you’ll likely need multiple devices, either embedded edge devices with onboard compute or a centralized server with multiple GPUs.

The RTX 5090 has significantly more vRAM than the RTX A4000. The A4000 is a few generations older as well. Personally I’m a fan of the professional cards (previously know as Quadro GPUs) as they have significantly lower power requirements. The A4000 is rated at 140 W of max power, and the RTX 5090 is rated for 575 W (likely to see spikes of power draw in excess of 600 W in bursts). The difference in power consumption is 4.1x for a 5090 vs the A4000. That means you could purchase 4x A4000 GPUs for to use the same amount of power as one 5090, with a total vRAM of 64 GB across the 4x A4000. That’s not to say that is what you need to accomplish your goal, it’s just a quick comparison of cost from a power perspective

Topic		Replies	Views
YOLO v8n GPU consumption with two running model inference on 1080p camera feed Discussion research , resource , discussion	1	107	August 20, 2024
Yolov8 model latency on jetson orin nx YOLO yolov8 , yolov9 , jetson	11	136	April 30, 2025
[Unofficial] Benchmark Results (How fast can you YOLO) Hardware yolov8 , desktop , benchmark	4	705	August 14, 2024
Docker Image Recommendation for YOLOv8 on Jetson Orin Nano Support question , support	3	65	May 12, 2025
Performance issues with yolo model reading results YOLO yolo , question , support , troubleshooting , code	10	617	February 12, 2025

Estimating Computational Power for 75 AI-Powered Cameras

1. Recommended Hardware

2. Software Optimizations

3. Benchmarks and Scaling

Suggested Next Steps

Related topics