Suppose that I have a custom model trained with YOLO12x, then I have used this model to detect some kinds of objects in the videos, but the inference time for each frame in the videos is extremely slow (above 1000 ms), while my expectation is that it must be below 200 ms. Therefore, are there any effective methods to improve this? Thank you very much.
YOLO12x is a very large model, one of the largest models in Ultralytics, so the speed will be slow. Unless you have a need for such a large model, you should switch to a smaller model like YOLO11m. Otherwise, getting a 5x speedup is difficult. If you have a CPU, you can try exporting to OpenVINO with int8=True.
If you have a GPU, then you can export to TensorRT with int8=True
. If itâs still slow, you need to switch to a smaller model.
@Toxite I have three following questions which need to be assisted:
- Can I export the YOLO model to the TensorRT (
.engine
) format in Google Colab/Kaggle, then I will use the exported model in my Windows PC? - If I want to use the exported model in my Windows PC, should I install any prerequisites in the PCâs environment?
- If I want to apply the model to the Live Inference with Streamlit Application using Ultralytics YOLO11 through the CLI command
yolo solutions inference model="path/to/model.pt"
, how can I pass thedevice
argument to the command?
- No. TensorRT is hardware specific. You need to export in your system.
- Ultralytics automatically installs it. But it fails, especially for old GPUs, you need to install TensorRT manually from NVIDIA page.
- If you pass the
.engine
file, it will automatically use GPU.
- Can I export the YOLO model to the ONNX or OpenVINO format in Google Colab/Kaggle, then I will use the exported model in my Windows PC (especially if I want to apply the inference by GPU)?
- Regarding to the third question above, how can I pass the
device
argument to the command if the format of the model is ONNX or OpenVINO?
- Yes. But ONNX will not provide much speedup, and OpenVINO doesnât use NVIDIA GPU.
- Passing
device
isnât supported. Ultralytics would prioritize GPU by default, and then fallback to CPU.
@Toxite My PC consists of the NVIDIA GeForce GTX 1650 GPU, can this kind of hardware be supported by the TensorRT model format? If so, how can I enable CUDA to this kind of GPU?
TensorRT is supported for your GPU. And it should be automatically installed by Ultralytics when you export a model to TensorRT.
@Toxite But it seems that this GPU is not recognized:
ValueError: Invalid CUDA âdevice=0â requested. Use âdevice=cpuâ or pass valid CUDA device(s) if available, i.e. âdevice=0â or âdevice=0,1,2,3â for Multi-GPU.
torch.cuda.is_available(): False
torch.cuda.device_count(): 0
os.environ[âCUDA_VISIBLE_DEVICESâ]: None
See https://pytorch.org/get-started/locally/ for up-to-date torch install instructions if no CUDA devices are seen by torch.
What should I do to solve this problem?
You need to install PyTorch with CUDA.
@Toxite During the process to export the model to the TensorRT format, Ultralytics had tried to install the onnx
package, but the installation was unsuccessful.
Did you install PyTorch with CUDA?
Can you post the logs?
@Toxite Here are the last lines of the logs (since most of them were cut off by the terminal):
ERROR: Failed building wheel for onnx
Failed to build onnx
error: failed-wheel-build-for-install
Ă Failed to build installable wheels for some pyproject.toml based projects
â°â> onnx
(Note: I have installed the latest version of PyTorch with CUDA by your given installation guide.)
Can you provide the output after running this command in terminal: yolo checks
?
@Toxite Here is the output after running the command yolo checks
:
Ultralytics 8.3.182 Python-3.13.7 torch-2.9.0.dev20250813+cu129 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
Setup complete (8 CPUs, 7.8 GB RAM, 180.6/195.3 GB disk)
OS Windows-11-10.0.27928-SP0
Environment Windows
Python 3.13.7
Install pip
Path D:\Python\Lib\site-packages\ultralytics
RAM 7.84 GB
Disk 180.6/195.3 GB
CPU Intel Core⢠i5-10300H 2.50GHz
CPU count 8
GPU NVIDIA GeForce GTX 1650, 4096MiB
GPU count 1
CUDA 12.9
numpy 2.3.2>=1.23.0
matplotlib 3.10.5>=3.3.0
opencv-python 4.12.0.88>=4.6.0
pillow 11.3.0>=7.1.2
pyyaml 6.0.2>=5.3.1
requests 2.32.5>=2.23.0
scipy 1.16.1>=1.4.1
torch 2.9.0.dev20250813+cu129>=1.8.0
torch 2.9.0.dev20250813+cu129!=2.4.0,>=1.8.0; sys_platform == âwin32â
torchvision 0.24.0.dev20250820+cu129>=0.9.0
tqdm 4.67.1>=4.64.0
psutil 7.0.0
py-cpuinfo 9.0.0
pandas 2.3.1>=1.1.4
ultralytics-thop 2.0.15>=2.0.0
Downgrade to Python 3.12 and reinstall Ultralytics and PyTorch with CUDA
@Toxite Are there any alternative solutions? Because I need the latest version of Python for my other projects, and thus I cannot downgrade it.
Python 3.13 is a very recent release, and many packages do not yet fully support this. So your only option is to downgrade.
@Toxite Why do Ultralytics not support the version 1.18 of the onnx
package, which is compatible with Python 3.13?
Because TensorFlow export fails with it.