Overcoming Time Skip Issues in M3U8 Stream Processing with YOLO11 and OpenCV

Hi Ultralytics Team,

I am using YOLO11 (model: yolo11n_openvino_model) for real-time object detection on M3U8 video streams using OpenCV. Here is the code I used:

import cv2
from ultralytics import YOLO

Load the YOLO model

model = YOLO(“./model/yolo11n_openvino_model”)

Open the video file

video_path = “https://atcs.banjarmasinkota.go.id/stream/FNZ59BM5VH/s.m3u8
cap = cv2.VideoCapture(video_path)

Loop through the video frames

while cap.isOpened():
success, frame = cap.read()
if success:
results = model(frame)
annotated_frame = results[0].plot()
cv2.imshow(“YOLO Inference”, annotated_frame)
if cv2.waitKey(1) & 0xFF == ord(“q”):
break
else:
break

cap.release()
cv2.destroyAllWindows()

Problem:
I’m having trouble time skip while processing this M3U8 stream. The video processing is not smooth, with some frames appearing to be skipped, causing inconsistent and choppy object detection. This stream plays smoothly in VLC, but with OpenCV, cap.isOpened() often returns False or success becomes False suddenly, causing the loop to stop or frames to be skipped.

Question:
Are there any specific settings in OpenCV or YOLO11 (e.g., OpenCV backend parameters, buffering configuration, or model optimization) that can be recommended to address the time skip issue in M3U8 streams, so that processing becomes more stable and real-time?
Does OpenCV (cv2.VideoCapture) have any limitations in handling M3U8 streams natively, and if so, does Ultralytics have any suggestions to improve the stability of the stream delivery without using additional libraries such as FFmpeg?
Are there any techniques or best practices in the YOLO11 pipeline to ensure that frames from a real-time stream are processed sequentially without skipping, especially for HLS streams such as M3U8?
If OpenCV is not ideal for M3U8 streams, does Ultralytics have any official examples or recommendations for integrating YOLO11 with alternatives like GStreamer for real-time streams?
Environment:

Python 3.9.13
OpenCV 4.9.0.80
ultralytics 8.3.36
Windows, CUDA 12.2

The M3U8 stream plays smoothly in VLC.
I would appreciate any suggestions, code examples, or guidance on how to work around this time skip issue so that object detection on M3U8 streams can run in real-time and stably. Thanks for your help

Supported video format types are shown in the docs Model Prediction with Ultralytics YOLO - Ultralytics YOLO Docs You could try passing the URL directly to the model.predict() to see if it works. Maybe try running from the CLI

yolo model="./model/yolo11n_openvino_model" \ 
    source="htts://atcs.banjarmasinkota.go.id/stream/FNZ59BM5VH/s.m3u8" \ 
    show=true

You’ll have to kill the script with Ctrl+C otherwise it will loop forever.

It could be due to using openvino as well, not 100% certain. If using the CLI is still choppy, try using yolo11n.pt instead of the OpenVino export. Do you have a GPU in the computer you’re using to run this code?

Thank you for your response, sir. Here’s the thing: I use OpenVino because I only have an integrated Intel UHD CPU and no GPU, and OpenVino is very lightweight when used with a CPU.

I’ve tried your suggestion of running it in the CLI, but the result is the same—the frames often stop for a few seconds before continuing, but it feels like skipping time.


The same issue occurs with yolo11n.pt.

I also tried running it on Google Colab with Flask and Ngrok, and the results were the same—frequent time skips, even though I was using a T4 GPU, but the issue remained the same.

I tried several inputs, such as offline videos and YouTube, and the results were the same.

Do you have any suggestions to help me with this issue? I am a beginner and still don’t know much about this. Thank you and please help.

If the model inference is faster than the video stream, this can happen. Consider if the video stream is running at 20 FPS and the model inference is running at 30 FPS. That means the model is running 10 FPS faster than the video stream. At that point, there’s nothing for the model to do, so it will wait until the next frame is available.

From the screenshot, it looks like the model is running at ~5 FPS. The video is probably going to have an FPS of at least ~10 FPS (as a guess). That could mean that there are other factors that are causing the “Waiting for stream” message.

It’s possible that the video decoding on the CPU is taking up resources or is slow, so the model will get bursts of frames instead of a steady stream. You mentioned the stream is fairly stable independent of the model, but when the model is also taking up CPU resources, it might cause delays. That’s why I was asking about a GPU, as offloading the model to the GPU might show that running the model on the CPU is causing a bottleneck.
Additionally, there could be network issues/instability that lead to the “Waiting for stream” message. These types of issues are usually harder to track down, as they aren’t always easy to reproduce.

I also tried running it on Google Colab with Flask and Ngrok

I’ve never tried this, nor heard anyone try running such a setup, but I suspect that there could be a lot of network overhead introduced by this, but I can’t say for certain.

integrated Intel UHD CPU and no GPU

In your original post, you showed

and the CUDA 12.2 part would only be present/relevant if there was a discrete NVIDIA GPU. So I’m a little confused about the hardware configuration.

Additionally, I would suggest creating a new Python virtual environment, perhaps with Python 3.10 or 3.11, and reinstall the latest PyTorch, Ultralytics, and OpenCV packages. It’s not a guarantee, but it’s possible that the issues you’re having were addressed in a more recent version.

A couple other things to try a few different inference arguments:

  1. Use the vid_stride to manually skip frames. If the model on CPU is too much overhead, it could help to run on every 2-3 frames. Using vid_stride=3 would run inference on every 3 frames, instead of all frames, which would be the default.
  2. Alternatively, you could use stream_buffer=True. This will introduce a delay, and might consume more memory, as it will queue older frames for inference. This will ensure you don’t miss anything, but over time it could mean that the inference results will lag behind the video more.

Check out the docs section for more details on each argument.