Measuring TOPS for YOLO Model Exported with TensorRT

inadob · September 25, 2025, 2:40pm

Hi everyone,

I’m currently working with Ultralytics to quantize and export a YOLO11n model using TensorRT. My goal is to benchmark its time performance on a Jetson AGX Orin system, utilizing various hardware configurations (GPU/DLA). I’m specifically interested in acquiring the TOPS metric for the model.

From what I understand, the TOPS metric is not directly accessible via Ultralytics or TensorRT. While the model in its original .pt form comes with a measure of FLOPs (e.g., YOLO11n → 6.5 GFLOPs), exporting it to a .engine file for GPU/DLA involves optimizations that alter the FLOP count. This makes the original GFLOPs figure insufficient for calculating TOPS after export.

Here are my questions:

Could anyone advise on how to determine the number of FLOPs or the number of multiply-add operations in the .engine model during inference?
Is there a way to directly extract the TOPS metric for a model after it’s been optimized with TensorRT? I’ve looked through the TensorRT and Ultralytics documentation, but haven’t found a clear method to achieve this.
Can the FLOPs count be used to derive TOPS for models quantized to int8?

I appreciate any insights or guidance on this matter!

BurhanQ · September 25, 2025, 4:00pm

I think that if you want to get a direct value from TensorRT, you might want to try NVIDIA’s Nsight application

BurhanQ · September 25, 2025, 4:04pm

This GitHub Issue might also be useful

Toxite · September 25, 2025, 6:25pm

Like the issues above mentions, TOPS is a metric for the hardware and not model. And FLOPs (floating point operations) of a model is not affected by conversion to TensorRT.

What does change is FLOPS (floating point operation per second).

inadob · September 26, 2025, 8:01am

@Toxite Thank you for adding this clarification on the difference between FLOPs and FLOPS. I guess they incorrectly used FLOPS instead of FLOPs in NVIDIA forum here
“FLOPS should not change going from TF to TRT (assuming network does not have redundant branches which don’t contribute to the network outputs).“

Also, can you confirm FLOPS was correctly used in this GitHub issue that @BurhanQ suggested:
”FLOPS in your case is model-specific, and TRT will do a lot of optimization on it, the correct way is to compute MACs of each layer and sum then up.”

inadob · September 26, 2025, 8:27am

I see some controversy in zerollzeng’s answer on how to calculate TOPS. So first he says that the correct way is to use MACs instead of FLOPs which is suggested in this Qualcomm article too:

“TOPS = 2 × MAC unit count × Frequency / 1 trillion”

However, at the end of the issue he confirms that “we cannot say how many TOPS of a model occupies on the orin in inference phase” referring to that formula.

I understand that TOPS is a hardware specific metric but i just want to use the yolo model as a tool for measuring the hardware capabilities of my device given in TOPS. I could use any other model to derive baseline TOPS

Toxite · September 26, 2025, 11:52am

He’s referring to FLOPs unless he accidentally omitted the divide by time part.

You get FLOPS by dividing FLOPs of model by time taken for a single forward pass. But that’s just effective FLOPS you’re reaching with the model. It doesn’t say much about your hardware’s theoretical FLOPS.

Topic		Replies	Views
How is the up to 5x speedup with TensorRT achived? Discussion code	13	470	September 3, 2025
Yoloe inference very slow on jetson with tensorrt Discussion discussion , tensorrt	28	825	September 9, 2025
Speed up inference time for the model trained with YOLO12x YOLO	32	1356	December 17, 2025
I would like to quantize my custom trained model YOLO question	1	967	January 5, 2025
Choose appropriate configurations for YOLO model exported to OpenVINO format YOLO openvino	5	358	August 19, 2025

Measuring TOPS for YOLO Model Exported with TensorRT

Related topics