Yolov8 model slowing down in real time detection after 2 hours

bhoppa · August 18, 2025, 3:27pm

I’m using a YoloV8 for detection of jellyfish from a live video feed. I’m recording at 30FPS, and I get each frame, then track the jellyfish, calculate the error, and move motors to track. Everything works great, but the inferencing time slows after long use.

My computer has a 3090, which I’m doing model.to(“cuda:0”) to use, it appears to be using when checking task manager performance.

Initially, I get print outs like this, everything working great:
0: 1024x1024 1 Jelly-Fish, 14.0ms
Speed: 5.7ms preprocess, 14.0ms inference, 1.3ms postprocess per image at shape (1, 3, 1024, 1024)

After 2 hrs or so I get:
0: 1024x1024 1 Jelly-Fish, 418.8ms
Speed: 22.9ms preprocess, 418.8ms inference, 41.7ms postprocess per image at shape (1, 3, 1024, 1024)

Code below
with torch.no_grad():
results = modelJF.predict(frame_resized, imgsz=1024, conf=0.25,iou=0.7,half=True,device=“cuda:0”, verbose=True)

I’ve tried
torch.cuda.empty_cache()
torch.cuda.synchronize()
or using tensors to try and use the mode without .predict, but none of that seems to be helping.

Toxite · August 18, 2025, 4:12pm

You can try exporting to TensorRT

with torch.no_grad() shouldn’t be used with Ultralytics. And model.to() should also be removed because Ultralytics handles that.

Is your device a desktop or a laptop?

bhoppa · August 18, 2025, 4:41pm

It’s a desktop with these specs
OS: Windows 10 (10.0.26100)
CPU: {‘Brand’: ‘AMD Ryzen 9 7950X3D 16-Core Processor’, ‘Arch’: ‘X86_64’, ‘Cores (logical)’: 32, ‘Cores (physical)’: 16, ‘Frequency (MHz)’: 4201.0}
RAM: 31.1 GB
GPUs: [{‘Name’: ‘NVIDIA GeForce RTX 3060’, ‘ID’: 0, ‘Driver’: ‘565.90’, ‘VRAM (GB)’: 12.0, ‘GPU Load (%)’: 0.0, ‘Temperature (C)’: 41.0}, {‘Name’: ‘NVIDIA GeForce RTX 3090’, ‘ID’: 1, ‘Driver’: ‘565.90’, ‘VRAM (GB)’: 24.0, ‘GPU Load (%)’: 0.0, ‘Temperature (C)’: 29.0}

I will try TensorRT and removing those other things.

bhoppa · August 19, 2025, 12:45am

I converted to a TensorRT but unfortunately the issue persisted. Another tracking run revealed the tracking slowed down to 0.15 seconds per inference, which was too slow to track the jellyfish.

BurhanQ · August 19, 2025, 11:04am

In addition, you mentioned you “calculate error” which leads me to believe there are other operations running in the code. Have you tried running inference independently, without any additional operations? The key here is to determine the location of the slow down. The first possible split would be hardware vs code, after that it would model inference vs other processing. I recommend running your code with only the inference loop too see if you experience the same slow down.

bhoppa · August 19, 2025, 9:08pm

You were correct thank you for reminding me of this fundamental. A queue I had in another part of the program was ballooning in size and slowing everything down.

Thank you!

pderrenger · August 20, 2025, 1:27pm

Great to hear you found the culprit and resolved it—nice debugging. For long-running real-time loops, two extra tips that often help prevent regressions: enable torch.backends.cudnn.benchmark = True when your input size is fixed to keep kernels optimal, and periodically log end-to-end timings for preprocess/inference/postprocess plus queue lengths so you can catch drift early. If you run into anything else, please share a minimal repro and your ultralytics and PyTorch versions so we can help confirm on the latest release. See the Predict and Thread-Safe Inference notes for stable streaming patterns in the Ultralytics Docs.

Topic		Replies	Views
Speed up inference time for the model trained with YOLO12x YOLO	30	107	August 22, 2025
Yolov8 model latency on jetson orin nx YOLO yolov8 , yolov9 , jetson	11	218	April 30, 2025
Yoloe inference very slow on jetson with tensorrt Discussion discussion , tensorrt	28	36	September 9, 2025
Performance issues with yolo model reading results YOLO yolo , question , support , troubleshooting , code	10	812	February 12, 2025
Total execution time of `model.predict()` is way higher than inference time Discussion code	5	34	August 21, 2025

Yolov8 model slowing down in real time detection after 2 hours

Related topics