Speed up inference time for the model trained with YOLO12x

Suppose that I have a custom model trained with YOLO12x, then I have used this model to detect some kinds of objects in the videos, but the inference time for each frame in the videos is extremely slow (above 1000 ms), while my expectation is that it must be below 200 ms. Therefore, are there any effective methods to improve this? Thank you very much.

YOLO12x is a very large model, one of the largest models in Ultralytics, so the speed will be slow. Unless you have a need for such a large model, you should switch to a smaller model like YOLO11m. Otherwise, getting a 5x speedup is difficult. If you have a CPU, you can try exporting to OpenVINO with int8=True. If you have a GPU, then you can export to TensorRT with int8=True. If it’s still slow, you need to switch to a smaller model.

@Toxite I have three following questions which need to be assisted:

  1. Can I export the YOLO model to the TensorRT (.engine) format in Google Colab/Kaggle, then I will use the exported model in my Windows PC?
  2. If I want to use the exported model in my Windows PC, should I install any prerequisites in the PC’s environment?
  3. If I want to apply the model to the Live Inference with Streamlit Application using Ultralytics YOLO11 through the CLI command yolo solutions inference model="path/to/model.pt", how can I pass the device argument to the command?
  1. No. TensorRT is hardware specific. You need to export in your system.
  2. Ultralytics automatically installs it. But it fails, especially for old GPUs, you need to install TensorRT manually from NVIDIA page.
  3. If you pass the .engine file, it will automatically use GPU.
1 Like

@Toxite

  1. Can I export the YOLO model to the ONNX or OpenVINO format in Google Colab/Kaggle, then I will use the exported model in my Windows PC (especially if I want to apply the inference by GPU)?
  2. Regarding to the third question above, how can I pass the device argument to the command if the format of the model is ONNX or OpenVINO?
  1. Yes. But ONNX will not provide much speedup, and OpenVINO doesn’t use NVIDIA GPU.
  2. Passing device isn’t supported. Ultralytics would prioritize GPU by default, and then fallback to CPU.

@Toxite My PC consists of the NVIDIA GeForce GTX 1650 GPU, can this kind of hardware be supported by the TensorRT model format? If so, how can I enable CUDA to this kind of GPU?

TensorRT is supported for your GPU. And it should be automatically installed by Ultralytics when you export a model to TensorRT.

@Toxite But it seems that this GPU is not recognized:

ValueError: Invalid CUDA ‘device=0’ requested. Use ‘device=cpu’ or pass valid CUDA device(s) if available, i.e. ‘device=0’ or ‘device=0,1,2,3’ for Multi-GPU.
torch.cuda.is_available(): False
torch.cuda.device_count(): 0
os.environ[‘CUDA_VISIBLE_DEVICES’]: None
See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no CUDA devices are seen by torch.

What should I do to solve this problem?

You need to install PyTorch with CUDA.

1 Like

@Toxite During the process to export the model to the TensorRT format, Ultralytics had tried to install the onnx package, but the installation was unsuccessful.

Did you install PyTorch with CUDA?

Can you post the logs?

@Toxite Here are the last lines of the logs (since most of them were cut off by the terminal):

ERROR: Failed building wheel for onnx
Failed to build onnx
error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects
╰─> onnx

(Note: I have installed the latest version of PyTorch with CUDA by your given installation guide.)

Can you provide the output after running this command in terminal: yolo checks?

@Toxite Here is the output after running the command yolo checks:

Ultralytics 8.3.182  Python-3.13.7 torch-2.9.0.dev20250813+cu129 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
Setup complete  (8 CPUs, 7.8 GB RAM, 180.6/195.3 GB disk)
OS                     Windows-11-10.0.27928-SP0
Environment            Windows
Python                 3.13.7
Install                pip
Path                   D:\Python\Lib\site-packages\ultralytics
RAM                    7.84 GB
Disk                   180.6/195.3 GB
CPU                    Intel Core™ i5-10300H 2.50GHz
CPU count              8
GPU                    NVIDIA GeForce GTX 1650, 4096MiB
GPU count              1
CUDA                   12.9
numpy                   2.3.2>=1.23.0
matplotlib              3.10.5>=3.3.0
opencv-python           4.12.0.88>=4.6.0
pillow                  11.3.0>=7.1.2
pyyaml                  6.0.2>=5.3.1
requests                2.32.5>=2.23.0
scipy                   1.16.1>=1.4.1
torch                   2.9.0.dev20250813+cu129>=1.8.0
torch                   2.9.0.dev20250813+cu129!=2.4.0,>=1.8.0; sys_platform == “win32”
torchvision             0.24.0.dev20250820+cu129>=0.9.0
tqdm                    4.67.1>=4.64.0
psutil                  7.0.0
py-cpuinfo              9.0.0
pandas                  2.3.1>=1.1.4
ultralytics-thop        2.0.15>=2.0.0

Downgrade to Python 3.12 and reinstall Ultralytics and PyTorch with CUDA

@Toxite Are there any alternative solutions? Because I need the latest version of Python for my other projects, and thus I cannot downgrade it.

Python 3.13 is a very recent release, and many packages do not yet fully support this. So your only option is to downgrade.

@Toxite Why do Ultralytics not support the version 1.18 of the onnx package, which is compatible with Python 3.13?

Because TensorFlow export fails with it.