Using batch_size in inference doesn't speed up?

phenomen21 · October 8, 2025, 7:00am

Toxite · October 8, 2025, 7:02am

You can try upgrading to latest Ultralytics and PyTorch.

phenomen21 · October 8, 2025, 8:43am

And I still get the same 25 seconds

Toxite · October 8, 2025, 12:59pm

What’s the GPU utilization in nvidia-smi when running inference?

phenomen21 · October 8, 2025, 6:05pm

0.8 GB for batch of 1
6 GB for batch of 16
12 GB for batch of 32
20 GB for batch of 64
CUDA out of memory for batch of 128

Toxite · October 8, 2025, 6:20pm

I mean GPU utilization. The percentage value shown by nvidia-smi.

phenomen21 · October 8, 2025, 7:16pm

That is exactly what I wrote. I type nvidia-smi, and observe my process that takes up some portion of GPU memory.
If you want percents, it’s

0.8 /24 = 3.3 % for batch 1
6/24 = 25% for batch 16
12/24 = 50% for batch 32
20/24 = 83.3% for batch 64

and CUDA out of memory for batch 128, which means 24 GBs of my A10 is not enough to take a batch of 128 images of [640, 640, 3] bytes.

Toxite · October 9, 2025, 12:39am

That’s memory usage. I am asking about GPU utilization (which is under “GPU Util.” column). It’s a different thing. I want to know the GPU utilization while the inference is running to see whether your GPU is being fully utilized during inference.

phenomen21 · October 9, 2025, 7:27am

88% for the batch of 1
100% for the batch of 16 and onwards.

Looks like my GPU is fully utilized.

Toxite · October 9, 2025, 7:38am

What size of model are you using? And what imgsz?

phenomen21 · October 9, 2025, 7:40am

I’m using your code, the model is yolo11n.pt, and image size is 640.
Everything is from your code.

phenomen21 · October 9, 2025, 7:55am

This is my nvidia-smi output, maybe this will help a bit.

Toxite · October 9, 2025, 1:12pm

Looks like your GPU fan isn’t working and GPU is overheating.

phenomen21 · October 9, 2025, 2:17pm

This is a rented server, I don’t have any physical access to its hardware.

Toxite · October 9, 2025, 2:33pm

Should probably switch to a different server because this is defective.

BurhanQ · October 10, 2025, 1:56pm

I think the A10 has a passive cooling system, so 80°C is probably somewhat expected under load

phenomen21 · October 10, 2025, 2:19pm

OK thanks, still no idea why the code is working 2-3 times slower than expected.

BurhanQ · October 12, 2025, 12:52pm

It’s very strange. The only thing I can think of, is that the GPU is connected with a very low PCIe contraction rate. This would make transfers from CPU to GPU much slower. I’ve never tested what inference would look like in this situation, but that’s my best guess. Given it’s a rented server, it’s likely not possible to fix this yourself, but you might want to contact your provider to help troubleshoot the issue. You could try running the test script on Google Colab (need to swap to using a T4 instance) to do a direct comparison like Toxite showed above

phenomen21 · October 24, 2025, 10:22am

Hello everyone.
I’ve been researching my problem and I mentioned in the first post that I’m using OBB model. From what I see, the OBB model doesn’t have any advantage with batched input, because all my processing time is the same with different batch size.
Can anyone tell me are there any lifehacks to make YOLO OBB model profit from batch size?

Toxite · October 24, 2025, 12:52pm

You can export the model to TensorRT for faster speed

Topic		Replies	Views
Total execution time of `model.predict()` is way higher than inference time Discussion code	5	227	August 21, 2025
Speed up inference time for the model trained with YOLO12x YOLO	32	1284	December 17, 2025
Yoloe inference very slow on jetson with tensorrt Discussion discussion , tensorrt	28	810	September 9, 2025
YOLOv11-cls.pt for image classification processing time per image fluctuates Discussion yolo , question , discussion	3	124	August 28, 2025
Performance issues with yolo model reading results YOLO yolo , question , support , troubleshooting , code	10	2373	February 12, 2025

Using batch_size in inference doesn't speed up?

Related topics