Speed up inference time for the model trained with YOLO12x

tommerfrancis · August 19, 2025, 4:31pm

Suppose that I have a custom model trained with YOLO12x, then I have used this model to detect some kinds of objects in the videos, but the inference time for each frame in the videos is extremely slow (above 1000 ms), while my expectation is that it must be below 200 ms. Therefore, are there any effective methods to improve this? Thank you very much.

Toxite · August 19, 2025, 5:27pm

YOLO12x is a very large model, one of the largest models in Ultralytics, so the speed will be slow. Unless you have a need for such a large model, you should switch to a smaller model like YOLO11m. Otherwise, getting a 5x speedup is difficult. If you have a CPU, you can try exporting to OpenVINO with int8=True. If you have a GPU, then you can export to TensorRT with int8=True. If it’s still slow, you need to switch to a smaller model.

tommerfrancis · August 20, 2025, 3:40am

@Toxite I have three following questions which need to be assisted:

Can I export the YOLO model to the TensorRT (.engine) format in Google Colab/Kaggle, then I will use the exported model in my Windows PC?
If I want to use the exported model in my Windows PC, should I install any prerequisites in the PC’s environment?
If I want to apply the model to the Live Inference with Streamlit Application using Ultralytics YOLO11 through the CLI command yolo solutions inference model="path/to/model.pt", how can I pass the device argument to the command?

Toxite · August 20, 2025, 10:31am

No. TensorRT is hardware specific. You need to export in your system.
Ultralytics automatically installs it. But it fails, especially for old GPUs, you need to install TensorRT manually from NVIDIA page.
If you pass the .engine file, it will automatically use GPU.

tommerfrancis · August 20, 2025, 10:34am

@Toxite

Can I export the YOLO model to the ONNX or OpenVINO format in Google Colab/Kaggle, then I will use the exported model in my Windows PC (especially if I want to apply the inference by GPU)?
Regarding to the third question above, how can I pass the device argument to the command if the format of the model is ONNX or OpenVINO?

Toxite · August 20, 2025, 10:44am

Yes. But ONNX will not provide much speedup, and OpenVINO doesn’t use NVIDIA GPU.
Passing device isn’t supported. Ultralytics would prioritize GPU by default, and then fallback to CPU.

tommerfrancis · August 20, 2025, 11:42am

@Toxite My PC consists of the NVIDIA GeForce GTX 1650 GPU, can this kind of hardware be supported by the TensorRT model format? If so, how can I enable CUDA to this kind of GPU?

Toxite · August 20, 2025, 12:10pm

TensorRT is supported for your GPU. And it should be automatically installed by Ultralytics when you export a model to TensorRT.

tommerfrancis · August 20, 2025, 1:02pm

@Toxite But it seems that this GPU is not recognized:

ValueError: Invalid CUDA ‘device=0’ requested. Use ‘device=cpu’ or pass valid CUDA device(s) if available, i.e. ‘device=0’ or ‘device=0,1,2,3’ for Multi-GPU.
torch.cuda.is_available(): False
torch.cuda.device_count(): 0
os.environ[‘CUDA_VISIBLE_DEVICES’]: None
See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no CUDA devices are seen by torch.

What should I do to solve this problem?

Toxite · August 20, 2025, 2:28pm

You need to install PyTorch with CUDA.

tommerfrancis · August 20, 2025, 5:13pm

@Toxite During the process to export the model to the TensorRT format, Ultralytics had tried to install the onnx package, but the installation was unsuccessful.

Toxite · August 20, 2025, 6:50pm

Did you install PyTorch with CUDA?

Can you post the logs?

tommerfrancis · August 21, 2025, 5:08am

@Toxite Here are the last lines of the logs (since most of them were cut off by the terminal):

ERROR: Failed building wheel for onnx
Failed to build onnx
error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects
╰─> onnx

(Note: I have installed the latest version of PyTorch with CUDA by your given installation guide.)

Toxite · August 21, 2025, 7:26am

Can you provide the output after running this command in terminal: yolo checks?

tommerfrancis · August 21, 2025, 8:06am

@Toxite Here is the output after running the command yolo checks:

Ultralytics 8.3.182  Python-3.13.7 torch-2.9.0.dev20250813+cu129 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
Setup complete  (8 CPUs, 7.8 GB RAM, 180.6/195.3 GB disk)
OS                     Windows-11-10.0.27928-SP0
Environment            Windows
Python                 3.13.7
Install                pip
Path                   D:\Python\Lib\site-packages\ultralytics
RAM                    7.84 GB
Disk                   180.6/195.3 GB
CPU                    Intel Core™ i5-10300H 2.50GHz
CPU count              8
GPU                    NVIDIA GeForce GTX 1650, 4096MiB
GPU count              1
CUDA                   12.9
numpy                   2.3.2>=1.23.0
matplotlib              3.10.5>=3.3.0
opencv-python           4.12.0.88>=4.6.0
pillow                  11.3.0>=7.1.2
pyyaml                  6.0.2>=5.3.1
requests                2.32.5>=2.23.0
scipy                   1.16.1>=1.4.1
torch                   2.9.0.dev20250813+cu129>=1.8.0
torch                   2.9.0.dev20250813+cu129!=2.4.0,>=1.8.0; sys_platform == “win32”
torchvision             0.24.0.dev20250820+cu129>=0.9.0
tqdm                    4.67.1>=4.64.0
psutil                  7.0.0
py-cpuinfo              9.0.0
pandas                  2.3.1>=1.1.4
ultralytics-thop        2.0.15>=2.0.0

Toxite · August 21, 2025, 10:01am

Downgrade to Python 3.12 and reinstall Ultralytics and PyTorch with CUDA

tommerfrancis · August 21, 2025, 10:07am

@Toxite Are there any alternative solutions? Because I need the latest version of Python for my other projects, and thus I cannot downgrade it.

Toxite · August 21, 2025, 10:10am

Python 3.13 is a very recent release, and many packages do not yet fully support this. So your only option is to downgrade.

tommerfrancis · August 21, 2025, 10:23am

@Toxite Why do Ultralytics not support the version 1.18 of the onnx package, which is compatible with Python 3.13?

Toxite · August 21, 2025, 10:42am

Because TensorFlow export fails with it.

Topic		Replies	Views
Yoloe inference very slow on jetson with tensorrt Discussion discussion , tensorrt	28	219	September 9, 2025
How is the up to 5x speedup with TensorRT achived? Discussion code	13	155	September 3, 2025
YOLO to lighter versions conversion for CPU deployement YOLO yolo , question , pytorch , onnx	1	219	June 25, 2025
Choose appropriate configurations for YOLO model exported to OpenVINO format YOLO openvino	5	92	August 19, 2025
Yolov8 model slowing down in real time detection after 2 hours Support support , bug-fix , discussion	6	104	August 20, 2025

Speed up inference time for the model trained with YOLO12x

Related topics