Yolo11 Object Detection on Jetson Orin Nano with Live Video

I have a Jetson Orin Nano devkit with JetPack 6.2.1. Yolo11 object detection with a photo or mp4 video functions properly. Now I’m trying to input live video from the CSI camera (CAM0), but it’s not working. See python code below. I’m using the jetson-utils library to capture CAM0 video frames, and feeding those to the Yolo “model” function.

import numpy as np
from ultralytics import YOLO 

import sys
from   jetson_utils import videoSource, videoOutput, Log

# Load YOLO TRT model
model = YOLO("/home/jet/robotics/yolo/example01/yolo11n.engine")

# Jetson_Utils initialize
input  = videoSource()
output = videoOutput()
 
# process frames until EOS or the user exits
while True:
    # capture the next image
    frame = input.Capture()

    if frame is None: # timeout
        continue  
        
    # render the image
    #output.Render(frame)

    # exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

    # Perform Yolo object detection on the frame
    results = model(frame)

    for resx in results:
        boxes     = resx.boxes  # Boxes object for bounding box outputs
        masks     = resx.masks  # Masks object for segmentation masks outputs
        keypoints = resx.keypoints  # Keypoints object for pose outputs
        probs     = resx.probs  # Probs object for classification outputs
        obb       = resx.obb  # Oriented boxes object for OBB outputs

        print(boxes)

The program output is below. It appears to be a variable mismatch with the frames input. Again, the code works just fine with photos and mp4 files. What is the proper way to feed live video to the Yolo model?

jet@sky:~/robotics/yolo/example01$ python3 detect_v04_camera.py 
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'.
[gstreamer] initialized gstreamer, version 1.20.3.0
[gstreamer] gstCamera -- attempting to create device csi://0
[gstreamer] gstCamera pipeline string:
[gstreamer] nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
[gstreamer] gstCamera successfully created device csi://0
[video]  created gstCamera from csi://0
------------------------------------------------
gstCamera video options:
------------------------------------------------
  -- URI: csi://0
     - protocol:  csi
     - location:  0
  -- deviceType: csi
  -- ioType:     input
  -- width:      1280
  -- height:     720
  -- frameRate:  30
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: rotate-180
------------------------------------------------
[OpenGL] glDisplay -- X screen 0 resolution:  3840x2160
[OpenGL] glDisplay -- X window resolution:    3840x2160
[OpenGL] glDisplay -- display device initialized (3840x2160)
[video]  created glDisplay from display://0
------------------------------------------------
glDisplay video options:
------------------------------------------------
  -- URI: display://0
     - protocol:  display
     - location:  0
  -- deviceType: display
  -- ioType:     output
  -- width:      3840
  -- height:     2160
  -- frameRate:  0
  -- numBuffers: 4
  -- zeroCopy:   true
------------------------------------------------
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> nvarguscamerasrc0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvarguscamerasrc0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvarguscamerasrc0
[gstreamer] gstreamer message stream-start ==> pipeline0
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected...
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3280 x 2464 FR = 21.000000 fps Duration = 47619048 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 3280 x 1848 FR = 28.000001 fps Duration = 35714284 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1640 x 1232 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1280 x 720 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 10.625000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
   Camera index = 0 
   Camera mode  = 4 
   Output Stream W = 1280 H = 720 
   seconds to Run    = 0 
   Frame Rate = 59.999999 
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
[gstreamer] gstCamera -- onPreroll
[gstreamer] gstBufferManager -- map buffer size was less than max size (1382400 vs 1382407)
[gstreamer] gstBufferManager recieve caps:  video/x-raw, width=(int)1280, height=(int)720, framerate=(fraction)30/1, format=(string)NV12
[gstreamer] gstBufferManager -- recieved first frame, codec=raw format=nv12 width=1280 height=720 size=1382407
[cuda]   allocated 4 ring buffers (1382407 bytes each, 5529628 bytes total)
[cuda]   allocated 4 ring buffers (8 bytes each, 32 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message async-done ==> pipeline0
[gstreamer] gstreamer message latency ==> mysink
[gstreamer] gstreamer message warning ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
[cuda]   allocated 4 ring buffers (2764800 bytes each, 11059200 bytes total)
Loading /home/jet/robotics/yolo/example01/yolo11n.engine for TensorRT inference...
[07/10/2025-00:43:30] [TRT] [I] Loaded engine size: 5 MiB
[07/10/2025-00:43:30] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[07/10/2025-00:43:31] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +12, now: CPU 0, GPU 14 (MiB)

Traceback (most recent call last):
  File "/home/jet/robotics/yolo/example01/detect_v04_camera.py", line 40, in <module>
    results = model(frame)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 181, in __call__
    return self.predict(source, stream, **kwargs)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/model.py", line 559, in predict
    return self.predictor.predict_cli(source=source) if is_cli else self.predictor(source=source, stream=stream)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 175, in __call__
    return list(self.stream_inference(source, model, *args, **kwargs))  # merge list of Result into one
  File "/home/jet/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
    response = gen.send(None)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 233, in stream_inference
    self.setup_source(source if source is not None else self.args.source)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/engine/predictor.py", line 205, in setup_source
    self.dataset = load_inference_source(
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/data/build.py", line 199, in load_inference_source
    source, stream, screenshot, from_img, in_memory, tensor = check_source(source)
  File "/home/jet/.local/lib/python3.10/site-packages/ultralytics/data/build.py", line 181, in check_source
    raise TypeError("Unsupported image type. For supported types see https://docs.ultralytics.com/modes/predict")
TypeError: Unsupported image type. For supported types see https://docs.ultralytics.com/modes/predict
[gstreamer] gstCamera -- stopping pipeline, transitioning to GST_STATE_NULL
GST_ARGUS: Cleaning up
CONSUMER: Done Success
GST_ARGUS: Done Success
[gstreamer] gstCamera -- pipeline stopped

Probably need to use cudaToNumpy:

from jetson_utils import cudaToNumpy

results = model(cudaToNumpy(frame))

Thank you @Toxite. Good news. I now have a working bare-bones Yolo11 Object Detection python code with video input. It’s very concise and clean but very powerful. Deployed it on my Orin Nano devkit with JetPack 6.2.1.

The cuda2Numpy function did the job! Working python code shown below. But more features needed.

import numpy as np
from ultralytics import YOLO 

import sys
from jetson_utils import videoSource, videoOutput, Log
from jetson_utils import cudaToNumpy

# Load YOLO TRT model
model = YOLO("/home/jet/robotics/yolo/example01/yolo11n.engine")

# Jetson_Utils initialize
input  = videoSource()
output = videoOutput()
 
# Run Inference on Video Frames
while True:

    # capture the next image
    frame = input.Capture()

    if frame is None: # timeout
        continue  
        
    # Render the image
    output.Render(frame)

    # Exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

    # Convert Jetson Cuda image to Numpy array  
    frame_numpy = cudaToNumpy(frame)

    # Run Yolo Inference
    results = model(frame_numpy)
 
    for resx in results:
        boxes     = resx.boxes      # Boxes object for bounding box outputs
        masks     = resx.masks      # Masks object for segmentation masks outputs
        keypoints = resx.keypoints  # Keypoints object for pose outputs
        probs     = resx.probs      # Probs object for classification outputs
        obb       = resx.obb        # Oriented boxes object for OBB outputs

Next feature needed is overlaying the Yolo bounding boxes onto the jetson_utils output window. The output window is currently showing the raw video frames and it’s very fast! But I need those bounding boxes. Any ideas?

2 Likes

Usually, you’d want to include the argument:

results = model(frame_numpy, show=True)

the other way would be to use the plot() method from the results object:

for resx in results:
    resx.plot()
1 Like

Hello @BurhanQ, yes I have tried results = model(frame_numpy, show=True), but it slows down the program execution dramatically. Separately, the resx.plot() command does not show anything.

I want to plot bounding boxes using jetson_utils functions because the output window is based on jetson_utils. The jetson raw video output is very fast and very responsive. Jetson_Utils uses the orin nano cuda cores, so I want to leverage that.

Fast jetson_utils video functions plus robust Yolo object detection is a powerful combination. Just need to get those bounding boxes plotted.

I guess you could do:

from jetson_utils import cudaFromNumpy
output.Render(cudaFromNumpy(resx.plot()))

Thanks @Toxite, your suggestion worked. Yolo11 object detection python code is working on Orin Nano Super devkit with JetPack 6.2.1. While code is only half a page long, it streams object detection results and draws bounding boxes in output window extremely fast. Inference time is ~9ms, and the output streaming is smooth and swift.

Code makes use of Ultralytics’ Yolo library for inference (installation for Orin Nano here), and jetson-utils for video input and output. That’s it. The hard part was figuring out the frame data type conversions between Yolo and jetson-utils. A big thanks to @Toxite.

from ultralytics import YOLO 

rom   jetson_utils import videoSource, videoOutput, Log
from   jetson_utils import cudaToNumpy
from   jetson_utils import cudaFromNumpy

# Load YOLO TRT model
model = YOLO("/home/jet/robotics/yolo/networks/yolo11n.engine")

# Jetson_Utils initialize
input  = videoSource()
output = videoOutput()
 
# Run Inference on Video Frames
while True:

    # capture the next image
    frame = input.Capture()

    if frame is None: # timeout
        continue  
        
    # Exit on input/output EOS
    if not input.IsStreaming() or not output.IsStreaming():
        break

    # Convert Jetson Cuda image to Numpy array  
    frame_numpy = cudaToNumpy(frame)

    # Run Yolo Inference
    results = model(frame_numpy)  #, show=True)
 
    for resx in results:
        boxes     = resx.boxes      # Boxes object for bounding box outputs
        masks     = resx.masks      # Masks object for segmentation masks outputs
        keypoints = resx.keypoints  # Keypoints object for pose outputs
        probs     = resx.probs      # Probs object for classification outputs
        obb       = resx.obb        # Oriented boxes object for OBB outputs

        # Display image and bounding box in Jetson_Utils output window
        output.Render(cudaFromNumpy(resx.plot()))

        print(boxes)   # Stream object detection results
1 Like

This thread should have been submitted from the “Support” category. Can admin transfer it?

It’s now in the YOLO Support topic!