Hi everyone:
I’ve successfully installed all necessary packages on my Jetson Orin NX 16GB plantform. And then I wrote a simple demo using YOLO11n:
model = YOLO(‘yolo11n.engine’)
image = cv2.imread(image_path)
results = model(image_path)
The ‘yolo11n.engine’ model file was exported using half=True, imgsz=640, nms=True.
And I’ve opened jetson_clock.
Here’s the result:
Loading yolo11n.engine for TensorRT inference…
[07/27/2025-00:39:00] [TRT] [I] Loaded engine size: 8 MiB
[07/27/2025-00:39:00] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +9, now: CPU 0, GPU 14 (MiB)
image 1/1 /home/me/Projects/helloPython/pics/bus.jpg: 640x640 4 persons, 1 bus, 8.0ms
Speed: 17.3ms preprocess, 8.0ms inference, 34.5ms postprocess per image at shape (1, 3, 640, 640)
As U can see above, the inference time is 8.0ms. But according to the benchmark metioned here:
Jetson Orin NX 16GB should has a performace of 4-5ms inference time @FP=16, that’s twice as fast as mine.
WHAT CAN I DO TO FIND OUT THE REASON?
THANKS A LOT!!!
ps: my evn info:
Ultralytics 8.3.169 Python-3.10.12 torch-2.5.0a0+872d972e41.nv24.08 CUDA:0 (Orin, 15656MiB)
Setup complete (8 CPUs, 15.3 GB RAM, 38.1/232.2 GB disk)
OS Linux-5.15.148-tegra-aarch64-with-glibc2.35
Environment Linux
Python 3.10.12
Install pip
Path /home/housebrain/Projects/.yenv/lib/python3.10/site-packages/ultralytics
RAM 15.29 GB
Disk 38.1/232.2 GB
CPU ARMv8 Processor rev 1 (v8l)
CPU count 8
GPU Orin, 15656MiB
GPU count 1
CUDA 12.6
numpy 1.26.4>=1.23.0
matplotlib 3.10.3>=3.3.0
opencv-python 4.12.0.88>=4.6.0
pillow 11.3.0>=7.1.2
pyyaml 6.0.2>=5.3.1
requests 2.32.4>=2.23.0
scipy 1.15.3>=1.4.1
torch 2.5.0a0+872d972e41.nv24.8>=1.8.0
torch 2.5.0a0+872d972e41.nv24.8!=2.4.0,>=1.8.0; sys_platform == “win32”
torchvision 0.20.0a0>=0.9.0
tqdm 4.67.1>=4.64.0
psutil 7.0.0
py-cpuinfo 9.0.0
pandas 2.3.1>=1.1.4
ultralytics-thop 2.0.14>=2.0.0