hi i want to deploy yolo11 model to deepstream and i did it very well but i haave some question
the fps was low i wanna know is there anything for pruning or quantizing for making inference time better in deepstream? because the default yolo model in deepstream can quantized in int64
can you give me a guidence about this?
Hello @mohammad_haydari,
Thanks for reaching out to the YOLO community!
To improve inference time when deploying your YOLO11 model to DeepStream, you can indeed optimize your model through quantization. Quantization can convert your model’s weights and activations to lower precision, like 8-bit integers, which reduces the model size and speeds up inference. For detailed guidance, the Ultralytics documentation on model optimization techniques provides insights into how quantization can enhance model performance, particularly on edge devices.
Additionally, you can explore using TensorRT for model optimization as it includes techniques like layer fusion and precision calibration, which are particularly effective on NVIDIA GPUs. The TensorRT integration page provides detailed steps on exporting YOLO11 models to TensorRT format.
I hope this helps you achieve better performance with your DeepStream deployment! Let the Ultralytics team know if you need anything else.