I’m currently working with Ultralytics to quantize and export a YOLO11n model using TensorRT. My goal is to benchmark its time performance on a Jetson AGX Orin system, utilizing various hardware configurations (GPU/DLA). I’m specifically interested in acquiring the TOPS metric for the model.
From what I understand, the TOPS metric is not directly accessible via Ultralytics or TensorRT. While the model in its original .pt form comes with a measure of FLOPs (e.g., YOLO11n → 6.5 GFLOPs), exporting it to a .engine file for GPU/DLA involves optimizations that alter the FLOP count. This makes the original GFLOPs figure insufficient for calculating TOPS after export.
Here are my questions:
Could anyone advise on how to determine the number of FLOPs or the number of multiply-add operations in the .engine model during inference?
Is there a way to directly extract the TOPS metric for a model after it’s been optimized with TensorRT? I’ve looked through the TensorRT and Ultralytics documentation, but haven’t found a clear method to achieve this.
Can the FLOPs count be used to derive TOPS for models quantized to int8?
I appreciate any insights or guidance on this matter!
Like the issues above mentions, TOPS is a metric for the hardware and not model. And FLOPs (floating point operations) of a model is not affected by conversion to TensorRT.
What does change is FLOPS (floating point operation per second).
@Toxite Thank you for adding this clarification on the difference between FLOPs and FLOPS. I guess they incorrectly used FLOPS instead of FLOPs in NVIDIA forum here “FLOPS should not change going from TF to TRT (assuming network does not have redundant branches which don’t contribute to the network outputs).“
Also, can you confirm FLOPS was correctly used in this GitHub issue that @BurhanQ suggested: ”FLOPS in your case is model-specific, and TRT will do a lot of optimization on it, the correct way is to compute MACs of each layer and sum then up.”
I see some controversy in zerollzeng’s answer on how to calculate TOPS. So first he says that the correct way is to use MACs instead of FLOPs which is suggested in this Qualcomm article too:
“TOPS = 2 × MAC unit count × Frequency / 1 trillion”
However, at the end of the issue he confirms that “we cannot say how many TOPS of a model occupies on the orin in inference phase” referring to that formula.
I understand that TOPS is a hardware specific metric but i just want to use the yolo model as a tool for measuring the hardware capabilities of my device given in TOPS. I could use any other model to derive baseline TOPS
He’s referring to FLOPs unless he accidentally omitted the divide by time part.
You get FLOPS by dividing FLOPs of model by time taken for a single forward pass. But that’s just effective FLOPS you’re reaching with the model. It doesn’t say much about your hardware’s theoretical FLOPS.