Speed up inference time for the model trained with YOLO12x

You can try installing onnx manually:

pip install onnx onnxruntime-gpu

then run export.

@Toxite But when I started the export process, I got the following message:

requirements: Ultralytics requirement [ā€˜onnx>=1.12.0,<1.18.0’] not found, attempting AutoUpdate…

Then, Ultralytics tried to install the version 1.17 of the onnx package.

(Note: When I installed the onnx package manually, the version to be installed was 1.18 by default.)

Does it continue after failing?

Can you post the full logs? These small snippets lack context

@Toxite The process still continued with the following message:

ONNX: starting export with onnx 1.18.0 opset 19…
ONNX: slimming with onnxslim 0.1.64…
ONNX: export success  701.3s, saved as 'best.onnx' (225.9 MB)
TensorRT: starting export with TensorRT 10.13.2.6...

Then it’s fine

@Toxite Regarding to the Live Inference with Streamlit Application using Ultralytics YOLO11 through the CLI command yolo solutions inference model="path/to/model.pt", can I pass two models simultaneously to the command?

No, you can’t

@Toxite I have two concerns which need to be assisted:

  1. If I do not use the TensorRT’s INT8 and FP16 quantizations (which means that I still keep the FP32 quantization by default), will the inference time for each frame in the videos be below 200 ms?
  2. If I use the TensorRT’s INT8 quantization, what should I do to keep (nearly) the same accuracy metric as the YOLO12x model? (Since when I used the TensorRT’s INT8 quantization and kept other parameters by default, the accuracy metric was extremely low.)
  1. You should use FP16 at least, if not INT8, if you want speedup. FP16 hardly changes the accuracy.
  2. INT8 requires passing a calibration dataaet. You need to pass your training dataset using data argument. This would have already been shown as a warning in the logs. You should read the logs closely.

@Toxite

  1. Regarding to my second concern above, I used my training dataset for calibration (passing my training dataset’s yaml file to the data argument), but the accuracy metric was extremely low.
  2. Should I pass my training dataset for calibration when using the TensorRT’s FP16 quantization?
  1. If that’s the case, then it’s probably due to using YOLO12, which is not recommended.
  2. FP16 doesn’t require calibration dataset