Yolov8 model latency on jetson orin nx

When I run the yolov9 model on my Jetson orin nx computer, the delay is at a reasonable level. However, when I run models such as yolov8 and yolov10, the delay is up to 1 second longer. What could be the reason for this?

I am testing the models in real time with the raspberry pi hq camera connected to my jetson computer. Here I am providing the camera input with opencv and gstreamer because my camera is a csi camera.

The issue may be due to the difference in script-based execution between YOLOv9 and the other models, as well as running them through the Ultralytics library.

How can I solve my problem in this case?

information about the system;

  • GStreamer version: 1.16.3
  • Ultralytics version: 8.0.113
  • JetPack version: R35 (release), REVISION: 3.1

You can try latest Ultralytics. You’re using a 2 year old version

I had the same problem with the latest ultralytics version. I found a clue to solve the problem. We can give image input with dataloaders.py in yolov9. Here, with some work with videocapture, it provides synchronization between the camera image input and yolo’s fps. I wonder if this is not available in ultralystics? The dataloaders.py code I used for yolov9 is as follows.

class LoadStreams:

def gstreamer_pipeline(self,
    sensor_id=0,
    capture_width=1600 ,
    capture_height=900 ,
    display_width=1600 ,
    display_height=900 ,
    framerate=30,
    flip_method=2,
):
    return (
	"nvarguscamerasrc sensor-id=%d ! "
	"video/x-raw(memory:NVMM), width=(int)%d, height=(int)%d, framerate=(fraction)%d/1 ! "
	"nvvidconv flip-method=%d ! "
	"video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
	"videoconvert ! "
	"video/x-raw, format=(string)BGR ! appsink"
	% (
	    sensor_id,
	    capture_width,
	    capture_height,
	    framerate,
	    flip_method,
	    display_width,
	    display_height,
	)
    )  



# YOLOv5 streamloader, i.e. `python detect.py --source 'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP streams`
def __init__(self, sources='streams.txt', img_size=640, stride=32, auto=True, transforms=None, vid_stride=1):
    torch.backends.cudnn.benchmark = True  # faster for fixed-size inference
    self.mode = 'stream'
    self.img_size = img_size
    self.stride = stride
    self.vid_stride = vid_stride  # video frame-rate stride
    sources = Path(sources).read_text().rsplit() if os.path.isfile(sources) else [sources]
    n = len(sources)
    self.sources = [clean_str(x) for x in sources]  # clean source names for later
    self.imgs, self.fps, self.frames, self.threads = [None] * n, [0] * n, [0] * n, [None] * n
    for i, s in enumerate(sources):  # index, source
        # Start thread to read frames from video stream
        st = f'{i + 1}/{n}: {s}... '
        if urlparse(s).hostname in ('www.youtube.com', 'youtube.com', 'youtu.be'):  # if source is YouTube video
            # YouTube format i.e. 'https://www.youtube.com/watch?v=Zgi9g1ksQHc' or 'https://youtu.be/Zgi9g1ksQHc'
            check_requirements(('pafy', 'youtube_dl==2020.12.2'))
            import pafy
            s = pafy.new(s).getbest(preftype="mp4").url  # YouTube URL
        s = eval(s) if s.isnumeric() else s  # i.e. s = '0' local webcam
        if s == 0:
            assert not is_colab(), '--source 0 webcam unsupported on Colab. Rerun command in a local environment.'
            assert not is_kaggle(), '--source 0 webcam unsupported on Kaggle. Rerun command in a local environment.'
        #cap = cv2.VideoCapture(s)
        cap = cv2.VideoCapture(self.gstreamer_pipeline(),cv2.CAP_GSTREAMER)
        assert cap.isOpened(), f'{st}Failed to open {s}'
        w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = cap.get(cv2.CAP_PROP_FPS)  # warning: may return 0 or nan
        self.frames[i] = max(int(cap.get(cv2.CAP_PROP_FRAME_COUNT)), 0) or float('inf')  # infinite stream fallback
        self.fps[i] = max((fps if math.isfinite(fps) else 0) % 100, 0) or 30  # 30 FPS fallback

        _, self.imgs[i] = cap.read()  # guarantee first frame
        self.threads[i] = Thread(target=self.update, args=([i, cap, s]), daemon=True)
        LOGGER.info(f"{st} Success ({self.frames[i]} frames {w}x{h} at {self.fps[i]:.2f} FPS)")
        self.threads[i].start()
    LOGGER.info('')  # newline

    # check for common shapes
    s = np.stack([letterbox(x, img_size, stride=stride, auto=auto)[0].shape for x in self.imgs])
    self.rect = np.unique(s, axis=0).shape[0] == 1  # rect inference if all shapes equal
    self.auto = auto and self.rect
    self.transforms = transforms  # optional
    if not self.rect:
        LOGGER.warning('WARNING ⚠️ Stream shapes differ. For optimal performance supply similarly-shaped streams.')

def update(self, i, cap, stream):
    # Read stream `i` frames in daemon thread
    n, f = 0, self.frames[i]  # frame number, frame array
    while cap.isOpened() and n < f:
        n += 1
        cap.grab()  # .read() = .grab() followed by .retrieve()
        if n % self.vid_stride == 0:
            success, im = cap.retrieve()
            if success:
                self.imgs[i] = im
            else:
                LOGGER.warning('WARNING ⚠️ Video stream unresponsive, please check your IP camera connection.')
                self.imgs[i] = np.zeros_like(self.imgs[i])
                cap.open(stream)  # re-open stream if signal was lost
        time.sleep(0.0)  # wait time

def __iter__(self):
    self.count = -1
    return self

def __next__(self):
    self.count += 1
    if not all(x.is_alive() for x in self.threads) or cv2.waitKey(1) == ord('q'):  # q to quit
        cv2.destroyAllWindows()
        raise StopIteration

    im0 = self.imgs.copy()
    if self.transforms:
        im = np.stack([self.transforms(x) for x in im0])  # transforms
    else:
        im = np.stack([letterbox(x, self.img_size, stride=self.stride, auto=self.auto)[0] for x in im0])  # resize
        im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW
        im = np.ascontiguousarray(im)  # contiguous

    return self.sources, im, im0, None, ''

def __len__(self):
    return len(self.sources)  # 1E12 frames = 32 streams at 30 FPS for 30 years

You can predict on the frame that you get using ths GStreamer pipeline:

results = model(frame)

What’s the code you’re using with Ultralytics?

This is the code i have been using;

from ultralytics import YOLO
import cv2

def gstreamer_pipeline(
        sensor_id=0,
        capture_width=1920,
        capture_height=1080,
        display_width=1920,
        display_height=1080,
        framerate=15,
        flip_method=0,
    ):
        return (
        "nvarguscamerasrc sensor-id=%d ! "
        "video/x-raw(memory:NVMM), width=(int)%d, height=(int)%d, framerate=(fraction)%d/1 ! "
        "nvvidconv flip-method=%d ! "
        "video/x-raw, width=(int)%d, height=(int)%d, format=(string)BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=(string)BGR ! appsink"
        % (
            sensor_id,
            capture_width,
            capture_height,
            framerate,
            flip_method,
            display_width,
            display_height,
        )
        )

# Load the YOLO model
# Use the specific model you intend to use (e.g., yolov8n.pt, yolov8s.pt, yolov8m.pt, yolov8l.pt, yolov8x.pt)
# Or your custom trained model path
model = YOLO('yolov8l.pt')

# Initialize video capture using the GStreamer pipeline
cap = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER)

if not cap.isOpened():
    print("Error: Could not open GStreamer pipeline.")
    exit()

print("GStreamer pipeline opened successfully. Starting tracking...")

while True:
    ret, frame = cap.read()
    if not ret:
        print("Error: Failed to grab frame.")
        break

    # --- Perform Tracking instead of Prediction ---
    # Use model.track() for object tracking
    # persist=True is essential to maintain track IDs across frames
    # You can optionally specify a tracker config file e.g., tracker='bytetrack.yaml'
    results = model.track(
        frame,                  # Source frame
        persist=True,           # Keep track IDs between frames
        device=0,               # Use GPU 0
        imgsz=640,              # Inference image size
        half=False,             # Use FP32 (set True for FP16 if supported and needed)
        conf=0.25,              # Confidence threshold for detection
        verbose=False           # Reduce console output
        )

    # --- Visualize the results ---
    # Get the frame annotated with bounding boxes and track IDs
    # results[0].plot() handles drawing for both detection and tracking
    annotated_frame = results[0].plot()

    # --- Display the annotated frame ---
    cv2.imshow("YOLOv8 Tracking", annotated_frame) # Show the frame with tracks

    # --- Exit condition ---
    if cv2.waitKey(1) & 0xFF == ord('q'):
        print("Exiting...")
        break

# Release resources
cap.release()
cv2.destroyAllWindows()
print("Resources released.")

Does the latency exist with prediction instead of tracking?

Did you try TensorRT?

Which YOLOv9 model did you use?

If by latency you’re referring to the inference FPS not being able to keep up with stream FPS, leading to desynchronization, then you can just add max-buffers=1 drop=True to appsink:

! appsink max-buffers=1 drop=True

This will drop old frames automatically and the inference will always use the newest frame.

Thank you very much. Your suggestion worked. I have been trying to solve this problem for a long time.

1 Like

Great to hear that the suggestion worked and your latency issue on the Jetson Orin NX is resolved!

For handling video streams efficiently, the ultralytics library provides built-in capabilities, as detailed in the Reference for ultralytics/data/loaders.py.

Should you need further performance enhancements on Jetson in the future, consider exporting to TensorRT, potentially using DeepStream as outlined in our guide on Ultralytics YOLO11 on NVIDIA Jetson using DeepStream SDK and TensorRT.

Happy coding!