Speed up inference time for the model trained with YOLO12x

You can try installing onnx manually:

pip install onnx onnxruntime-gpu

then run export.

@Toxite But when I started the export process, I got the following message:

requirements: Ultralytics requirement [‘onnx>=1.12.0,<1.18.0’] not found, attempting AutoUpdate…

Then, Ultralytics tried to install the version 1.17 of the onnx package.

(Note: When I installed the onnx package manually, the version to be installed was 1.18 by default.)

Does it continue after failing?

Can you post the full logs? These small snippets lack context

@Toxite The process still continued with the following message:

ONNX: starting export with onnx 1.18.0 opset 19…
ONNX: slimming with onnxslim 0.1.64…
ONNX: export success  701.3s, saved as 'best.onnx' (225.9 MB)
TensorRT: starting export with TensorRT 10.13.2.6...

Then it’s fine

@Toxite Regarding to the Live Inference with Streamlit Application using Ultralytics YOLO11 through the CLI command yolo solutions inference model="path/to/model.pt", can I pass two models simultaneously to the command?

No, you can’t

@Toxite I have two concerns which need to be assisted:

  1. If I do not use the TensorRT’s INT8 and FP16 quantizations (which means that I still keep the FP32 quantization by default), will the inference time for each frame in the videos be below 200 ms?
  2. If I use the TensorRT’s INT8 quantization, what should I do to keep (nearly) the same accuracy metric as the YOLO12x model? (Since when I used the TensorRT’s INT8 quantization and kept other parameters by default, the accuracy metric was extremely low.)
  1. You should use FP16 at least, if not INT8, if you want speedup. FP16 hardly changes the accuracy.
  2. INT8 requires passing a calibration dataaet. You need to pass your training dataset using data argument. This would have already been shown as a warning in the logs. You should read the logs closely.

@Toxite

  1. Regarding to my second concern above, I used my training dataset for calibration (passing my training dataset’s yaml file to the data argument), but the accuracy metric was extremely low.
  2. Should I pass my training dataset for calibration when using the TensorRT’s FP16 quantization?
  1. If that’s the case, then it’s probably due to using YOLO12, which is not recommended.
  2. FP16 doesn’t require calibration dataset

@Toxite
for yolo checks in my env

Ultralytics 8.3.239 🚀 Python-3.11.13 torch-2.6.0+cu124 CUDA:0 (Tesla T4, 15095MiB)
Setup complete ✅ (4 CPUs, 31.4 GB RAM, 6586.5/8062.4 GB disk)

OS                     Linux-6.6.105+-x86_64-with-glibc2.35
Environment            Colab
Python                 3.11.13
Install                pip
Path                   /usr/local/lib/python3.11/dist-packages/ultralytics
RAM                    31.35 GB
Disk                   6586.5/8062.4 GB
CPU                    Intel Xeon CPU @ 2.00GHz
CPU count              4
GPU                    Tesla T4, 15095MiB
GPU count              2
CUDA                   12.4

numpy                  ✅ 2.2.6>=1.23.0
matplotlib             ✅ 3.7.2>=3.3.0
opencv-python          ✅ 4.12.0.88>=4.6.0
pillow                 ✅ 11.3.0>=7.1.2
pyyaml                 ✅ 6.0.3>=5.3.1
requests               ✅ 2.32.5>=2.23.0
scipy                  ✅ 1.15.3>=1.4.1
torch                  ✅ 2.6.0+cu124>=1.8.0
torch                  ✅ 2.6.0+cu124!=2.4.0,>=1.8.0; sys_platform == "win32"
torchvision            ✅ 0.21.0+cu124>=0.9.0
psutil                 ✅ 7.1.3>=5.8.0
polars                 ✅ 1.25.0>=0.20.0
ultralytics-thop       ✅ 2.0.18>=2.0.18

and throws me an error

ValueError: Invalid CUDA 'device=1,0' requested. Use 'device=cpu' or pass valid CUDA device(s) if available, i.e. 'device=0' or 'device=0,1,2,3' for Multi-GPU.

torch.cuda.is_available(): True
torch.cuda.device_count(): 1
os.environ['CUDA_VISIBLE_DEVICES']: 1,0

for the train command

run_results = model.train(data="/kaggle/working/pvelad-2/data.yaml", 
                          epochs=100, 
                          imgsz=640,
                         device=[1,0],
                          lr0=0.01, lrf=0.01, cos_lr=True, warmup_epochs=3.0,workers=8)

Would you please help me on this ?

Right click on any cell and click Restart Kernel