Yolov8 model slowing down in real time detection after 2 hours

I’m using a YoloV8 for detection of jellyfish from a live video feed. I’m recording at 30FPS, and I get each frame, then track the jellyfish, calculate the error, and move motors to track. Everything works great, but the inferencing time slows after long use.

My computer has a 3090, which I’m doing model.to(“cuda:0”) to use, it appears to be using when checking task manager performance.

Initially, I get print outs like this, everything working great:
0: 1024x1024 1 Jelly-Fish, 14.0ms
Speed: 5.7ms preprocess, 14.0ms inference, 1.3ms postprocess per image at shape (1, 3, 1024, 1024)

After 2 hrs or so I get:
0: 1024x1024 1 Jelly-Fish, 418.8ms
Speed: 22.9ms preprocess, 418.8ms inference, 41.7ms postprocess per image at shape (1, 3, 1024, 1024)

Code below
with torch.no_grad():
results = modelJF.predict(frame_resized, imgsz=1024, conf=0.25,iou=0.7,half=True,device=“cuda:0”, verbose=True)

I’ve tried
torch.cuda.empty_cache()
torch.cuda.synchronize()
or using tensors to try and use the mode without .predict, but none of that seems to be helping.

You can try exporting to TensorRT

with torch.no_grad() shouldn’t be used with Ultralytics. And model.to() should also be removed because Ultralytics handles that.

Is your device a desktop or a laptop?

It’s a desktop with these specs
OS: Windows 10 (10.0.26100)
CPU: {‘Brand’: ‘AMD Ryzen 9 7950X3D 16-Core Processor’, ‘Arch’: ‘X86_64’, ‘Cores (logical)’: 32, ‘Cores (physical)’: 16, ‘Frequency (MHz)’: 4201.0}
RAM: 31.1 GB
GPUs: [{‘Name’: ‘NVIDIA GeForce RTX 3060’, ‘ID’: 0, ‘Driver’: ‘565.90’, ‘VRAM (GB)’: 12.0, ‘GPU Load (%)’: 0.0, ‘Temperature (C)’: 41.0}, {‘Name’: ‘NVIDIA GeForce RTX 3090’, ‘ID’: 1, ‘Driver’: ‘565.90’, ‘VRAM (GB)’: 24.0, ‘GPU Load (%)’: 0.0, ‘Temperature (C)’: 29.0}

I will try TensorRT and removing those other things.

I converted to a TensorRT but unfortunately the issue persisted. Another tracking run revealed the tracking slowed down to 0.15 seconds per inference, which was too slow to track the jellyfish.

In addition, you mentioned you “calculate error” which leads me to believe there are other operations running in the code. Have you tried running inference independently, without any additional operations? The key here is to determine the location of the slow down. The first possible split would be hardware vs code, after that it would model inference vs other processing. I recommend running your code with only the inference loop too see if you experience the same slow down.

You were correct thank you for reminding me of this fundamental. A queue I had in another part of the program was ballooning in size and slowing everything down.

Thank you!

1 Like

Great to hear you found the culprit and resolved it—nice debugging. For long-running real-time loops, two extra tips that often help prevent regressions: enable torch.backends.cudnn.benchmark = True when your input size is fixed to keep kernels optimal, and periodically log end-to-end timings for preprocess/inference/postprocess plus queue lengths so you can catch drift early. If you run into anything else, please share a minimal repro and your ultralytics and PyTorch versions so we can help confirm on the latest release. See the Predict and Thread-Safe Inference notes for stable streaming patterns in the Ultralytics Docs.