Premise: Data collection of some YOLOv11-related detection models for highways from a drone aerial perspective.
Problems: Detection of vehicles occupying emergency lanes, illegal parking, driving in the wrong direction, illegal lane changes, vehicles crossing dashed lines, vehicles crossing lane dividers, and reversing (these detections seem impossible to implement using a standard single YOLOv11 model). I tested emergency lane detection, and the results of using labelimg to annotate the dataset are as follows.
The following is an example of how I tested the vehicleâs pressure on the airflow deflector markings (these are usually difficult to find, so I used Photoshop to cut them out).
Itâs not clear what you need assistance with specifically. See the rest of this reply to understand what I mean.
âData collectionâ is a bit vague. YOLO models can detect objects in images, but what insights or analytics you create from those detections is independent of the detections themselves.
Youâre correct, these are all things that require contextual information about location, lane count, expected traffic direction (per lane), type of vehicle(s), local traffic laws (not just the region, but specifically the area captured in the camera as certain laws may vary by time/area), etc. Many of these will vary considerably and will be challenging to have a âuniversalâ solution to recognize.
What you describe is a massive problem to attempt to solve. Instead of attempting to tackle all of it at once, you really should start with breaking down your problem. The part that YOLO can help with, is detection of objects and even tracking those objects. From what youâre describing, you probably need to train your model to detect (note you may want/need separate models to track moving objects vs detecting stationary ones; relative to the ground):
vehicles (separate based on types you want to distinguish: car, truck, van, construction, etc.)
emergency vehicles (separate on types: police, fire, medical, forest, etc.)
road barriers/divers (not the markings on the road)
bridges
on/off-ramps
shoulders (paved road areas not intended for driving)
etc.
You could try to detect road markings (lane markers, HOV symbols, emergency lanes, road shoulder, etc.), but they will be quite small from most aerial views, so detection of these will be difficult or require heavy compute. Itâs up to you to test and decide if thatâs something that you can incorporate or not.
Detections are the first part of your pipeline. After that, youâll have to determine how you can use that information to determine the âeventsâ you wish to create a signal about: lane changes, blocking emergency vehicle, stopped/parked vehicle, etc. Note, that it may be difficult to distinguish between vehicle driving the wrong direction versus a vehicle reversing. You have to have much more context than âthis is a vehicle and this is the direction its movingâ to determine which scenario it is. Thatâs one of many aspects of what makes what you describe extremely challenging. For instance, in the first photo you shared, humans can infer that the lanes that are on the right side of the image are traveling towards the top of the image, and the lanes on the left are traveling towards the bottom of the image.
That said, without the context of where this photo is from (I donât need to know, just making a point), I donât know if those vehicles are actually in the correct lane. I can assume they are, but computers arenât capable of making those assumptions. Itâs easy to say, if all the vehicles are traveling in the same direction, thatâs the correct direction of traffic, but there are absolutely situations where opposite direction traffic could be routed into one of those lanes. Which means that assumption doesnât hold 100% of the time. That may or may not be acceptable for the problem(s) youâre attempting to solve.
A lot of the things you outlined as problems, which I assume that you mean you want to solve them, will require the specific context for every view, and if the images are from a moving drone, then the context is always changing. Even from a fixed camera perspective, what you describe is not a simple task, itâs easier, but still quite a lot and very difficult; capturing images from a moving drone will make contextual information exceptionally more difficult to determine.
Advice on what you can do going forward. Start with breaking down your problem. This will help considerably with figuring out what steps you need to take to create a reasonable solution. Thereâs even a guide with some advice on this and even a step-by-step overview for working on computer vision projects. You should also consider breaking apart your problem, and separating different tasks/problems into individual solutions.
As an example, you could start with detecting traffic and keep a history of traffic trends (direction, counts, flow rate, etc.), thatâs one solution and generates a lot of data. From there, you can use the trend data to look for anomalies. That might help you find instances of what youâd need for some of the other things youâre looking to provide solutions to. Breaking your problem apart also means you have modular pieces, and it will make it easier to manage. Building a monolith system to solve numerous problems can become a nightmare to maintain, since if one part breaks or needs retraining, it may take a very long time to resolve. Separating the features will allow you to fix individual components much quicker and with less headaches.
Thank you so much for your detailed answer; it has been a great help. As my online name suggests, Iâm a YOLO beginner, and Iâve only mastered the basic skills: how to label YOLO datasets and train YOLO models. I may not fully understand the methods you provided, and I still need to learn a lot, but my work doesnât allow me much time. I also tried asking the AI, and it told me that when labeling things like vehicles running over dashed lines, I need to label both the vehicles and the dashed lines, and then determine whether the vehicles and dashed lines overlap as the standard for vehicle running over dashed lines. I will also try using the methods provided by the AI ââto train the model. In short, thank you so much for your answer; itâs like a fish finding water for me.
Yepâwhat the AI told you is the right idea conceptually: YOLO should detect things, and then you use geometry/rules to decide if an event happened (crossed a dashed line, entered emergency lane, wrong-way, etc.). Trying to train a single YOLO11 detector to directly output âillegal lane changeâ almost always turns into high false-positives because itâs a context + time problem, not a single-frame object problem.
For âvehicle crosses dashed lineâ, Iâd avoid boxing every dash segment (itâs tiny, inconsistent, and labeling is painful). A much more robust path is: detect vehicles with YOLO11, and get lane markings / emergency-lane area as a segmentation mask (either a small custom YOLO11-seg model or even a classical lane-marking method). Then check whether the vehicle box overlaps the lane-marking mask.
Hereâs the simplest possible overlap check pattern:
import numpy as np
from ultralytics import YOLO
veh = YOLO("yolo11n.pt") # vehicle detector (or your trained model)
lane = YOLO("lane_best.pt") # your YOLO11-seg lane/marking model
v = veh.track(source="video.mp4", persist=True, imgsz=1280)
l = lane.predict(source="video.mp4", imgsz=1280)
boxes = v[0].boxes.xyxy.cpu().numpy().astype(int)
mask = l[0].masks.data[0].cpu().numpy().astype(bool) # single class mask example
for (x1, y1, x2, y2) in boxes:
overlap = mask[y1:y2, x1:x2].mean()
if overlap > 0.03:
print("vehicle intersects lane marking")
If your dashed lines are small from a drone view, bumping imgsz (and/or training with tiled crops) usually helps a lot; also make sure you include plenty of âhard negativesâ (vehicles near lines but not crossing) to reduce false positives. For the âeventâ side, using tracking + regions is often the quickest routeâthis is exactly the kind of logic covered in the docs guide on tracking in zones and it pairs well with traffic workflows like the YOLO11 traffic management overview.
If you share one short clip + your class list (what youâre labeling right now), I can suggest whether you should model the road context as segmentation (recommended) or as a few coarse polygons (fastest).
Thank you very much for your advice, which has brought me a lot of surprises in my boring life. I will always keep your reply in my memo. Your advice is very detailed and of great help to me. I will take my time to follow the steps to complete it, but currently I need to focus on another work task, so I may need some time to verify and improve it. As for the video, I donât have time to record and share it now. Thank you for letting me feel the kindness of the YOLO community forum. This is the first time I post on the Internet, and I will always remember this day and your replies. I also hope to become a technical expert one day. Thank you again!