How to add ConvLSTM in the detection head of yolov11n for temporal context

Ragul · August 20, 2025, 7:07am

@BurhanQ @Jordan_Cooper
I’m working on an object detection system for human intrusion detection. Right now I’m using a YOLO-based model for frame-by-frame detection. The issue I’m facing is false positives when there’s sudden movement in the scene — the model sometimes flags motion artifacts or partial shapes as a person.

To fix this, I want to add temporal context so that the detector remembers information from previous frames and avoids spurious detections. My plan was to:

take the YOLO feature maps or head outputs,
feed them into an LSTM (or ConvLSTM) to capture temporal information,
then output detections that are temporally consistent.

Questions:

Is this approach reasonable for reducing false positives?
Where exactly is the best place to inject an LSTM (backbone features vs detection head)?
Are there simpler or more robust alternatives — e.g. optical flow, temporal smoothing, or post-processing with a tracker (like ByteTrack/DeepSORT) — instead of modifying YOLO internals?
For real-time inference, how do people usually maintain LSTM state between frames?

My end goal is a detector that works better on live video streams, not just individual frames. I’m open to either training-time modifications (YOLO+LSTM end-to-end) or post-processing methods if they’re more practical.

Any guidance or examples would be really appreciated!

Thanks in advance!

Toxite · August 20, 2025, 10:39am

It can work
It’s complicated enough that it can’t be explained in a response.
Using optical flow or frame difference and using that as the 4th channel and training YOLO is the simplest with the least modifications required.
From what I know, you reset it when there’s no activity. Or reset it after something positive is detected.

Topic		Replies	Views
A Comprehensive Review of YOLO: from YOLOv1 to YOLOv8 and Beyond Research	0	367	July 3, 2024
The original YOLO publications Research research	1	431	July 16, 2024
Object Detection with YOLOv8 and Depth-sensor Camera by :opencv: Resources	1	781	July 19, 2024
Object Tracking between API calls YOLO question , troubleshooting	2	87	November 10, 2024
New and with a project that got dumped in my lap YESTERDAY! YOLO support	57	484	July 3, 2025

How to add ConvLSTM in the detection head of yolov11n for temporal context

Related topics