Speed up inference time for the model trained with YOLO12x

Toxite · August 21, 2025, 10:46am

You can try installing onnx manually:

pip install onnx onnxruntime-gpu

then run export.

tommerfrancis · August 21, 2025, 11:09am

@Toxite But when I started the export process, I got the following message:

requirements: Ultralytics requirement [‘onnx>=1.12.0,<1.18.0’] not found, attempting AutoUpdate…

Then, Ultralytics tried to install the version 1.17 of the onnx package.

(Note: When I installed the onnx package manually, the version to be installed was 1.18 by default.)

Toxite · August 21, 2025, 12:02pm

Does it continue after failing?

Can you post the full logs? These small snippets lack context

tommerfrancis · August 21, 2025, 12:26pm

@Toxite The process still continued with the following message:

ONNX: starting export with onnx 1.18.0 opset 19…
ONNX: slimming with onnxslim 0.1.64…
ONNX: export success  701.3s, saved as 'best.onnx' (225.9 MB)
TensorRT: starting export with TensorRT 10.13.2.6...

Toxite · August 21, 2025, 1:22pm

Then it’s fine

tommerfrancis · August 21, 2025, 1:26pm

@Toxite Regarding to the Live Inference with Streamlit Application using Ultralytics YOLO11 through the CLI command yolo solutions inference model="path/to/model.pt", can I pass two models simultaneously to the command?

Toxite · August 21, 2025, 1:28pm

No, you can’t

tommerfrancis · August 22, 2025, 3:48am

@Toxite I have two concerns which need to be assisted:

If I do not use the TensorRT’s INT8 and FP16 quantizations (which means that I still keep the FP32 quantization by default), will the inference time for each frame in the videos be below 200 ms?
If I use the TensorRT’s INT8 quantization, what should I do to keep (nearly) the same accuracy metric as the YOLO12x model? (Since when I used the TensorRT’s INT8 quantization and kept other parameters by default, the accuracy metric was extremely low.)

Toxite · August 22, 2025, 4:01am

You should use FP16 at least, if not INT8, if you want speedup. FP16 hardly changes the accuracy.
INT8 requires passing a calibration dataaet. You need to pass your training dataset using data argument. This would have already been shown as a warning in the logs. You should read the logs closely.

tommerfrancis · August 22, 2025, 4:05am

@Toxite

Regarding to my second concern above, I used my training dataset for calibration (passing my training dataset’s yaml file to the data argument), but the accuracy metric was extremely low.
Should I pass my training dataset for calibration when using the TensorRT’s FP16 quantization?

Toxite · August 22, 2025, 4:25am

If that’s the case, then it’s probably due to using YOLO12, which is not recommended.
FP16 doesn’t require calibration dataset

Topic		Replies	Views
YOLO to lighter versions conversion for CPU deployement YOLO yolo , question , pytorch , onnx	1	45	June 25, 2025
Choose appropriate configurations for YOLO model exported to OpenVINO format YOLO openvino	5	13	August 19, 2025
Yolov8 model slowing down in real time detection after 2 hours Support support , bug-fix , discussion	6	19	August 20, 2025
Multiple model formats support for the Live Inference with Streamlit Application using Ultralytics YOLO11 YOLO	3	57	July 26, 2025
I would like to quantize my custom trained model YOLO question	1	346	January 5, 2025

Speed up inference time for the model trained with YOLO12x

Related topics