I need help with training my own dataset.

Premise: Data collection of some YOLOv11-related detection models for highways from a drone aerial perspective.

Problems: Detection of vehicles occupying emergency lanes, illegal parking, driving in the wrong direction, illegal lane changes, vehicles crossing dashed lines, vehicles crossing lane dividers, and reversing (these detections seem impossible to implement using a standard single YOLOv11 model). I tested emergency lane detection, and the results of using labelimg to annotate the dataset are as follows.

The following is a record of the test vehicle crossing the dashed line:

The following is an example of how I tested the vehicle’s pressure on the airflow deflector markings (these are usually difficult to find, so I used Photoshop to cut them out).

But the false positive rate is too high after training!

I’m completely stuck on training it now. Can anyone help me?:frowning:

It’s not clear what you need assistance with specifically. See the rest of this reply to understand what I mean.

“Data collection” is a bit vague. YOLO models can detect objects in images, but what insights or analytics you create from those detections is independent of the detections themselves.

You’re correct, these are all things that require contextual information about location, lane count, expected traffic direction (per lane), type of vehicle(s), local traffic laws (not just the region, but specifically the area captured in the camera as certain laws may vary by time/area), etc. Many of these will vary considerably and will be challenging to have a ‘universal’ solution to recognize.

What you describe is a massive problem to attempt to solve. Instead of attempting to tackle all of it at once, you really should start with breaking down your problem. The part that YOLO can help with, is detection of objects and even tracking those objects. From what you’re describing, you probably need to train your model to detect (note you may want/need separate models to track moving objects vs detecting stationary ones; relative to the ground):

  • vehicles (separate based on types you want to distinguish: car, truck, van, construction, etc.)
  • emergency vehicles (separate on types: police, fire, medical, forest, etc.)
  • road barriers/divers (not the markings on the road)
  • bridges
  • on/off-ramps
  • shoulders (paved road areas not intended for driving)
  • etc.

You could try to detect road markings (lane markers, HOV symbols, emergency lanes, road shoulder, etc.), but they will be quite small from most aerial views, so detection of these will be difficult or require heavy compute. It’s up to you to test and decide if that’s something that you can incorporate or not.

Detections are the first part of your pipeline. After that, you’ll have to determine how you can use that information to determine the “events” you wish to create a signal about: lane changes, blocking emergency vehicle, stopped/parked vehicle, etc. Note, that it may be difficult to distinguish between vehicle driving the wrong direction versus a vehicle reversing. You have to have much more context than “this is a vehicle and this is the direction its moving” to determine which scenario it is. That’s one of many aspects of what makes what you describe extremely challenging. For instance, in the first photo you shared, humans can infer that the lanes that are on the right side of the image are traveling towards the top of the image, and the lanes on the left are traveling towards the bottom of the image.

That said, without the context of where this photo is from (I don’t need to know, just making a point), I don’t know if those vehicles are actually in the correct lane. I can assume they are, but computers aren’t capable of making those assumptions. It’s easy to say, if all the vehicles are traveling in the same direction, that’s the correct direction of traffic, but there are absolutely situations where opposite direction traffic could be routed into one of those lanes. Which means that assumption doesn’t hold 100% of the time. That may or may not be acceptable for the problem(s) you’re attempting to solve.

A lot of the things you outlined as problems, which I assume that you mean you want to solve them, will require the specific context for every view, and if the images are from a moving drone, then the context is always changing. Even from a fixed camera perspective, what you describe is not a simple task, it’s easier, but still quite a lot and very difficult; capturing images from a moving drone will make contextual information exceptionally more difficult to determine.

Advice on what you can do going forward. Start with breaking down your problem. This will help considerably with figuring out what steps you need to take to create a reasonable solution. There’s even a guide with some advice on this and even a step-by-step overview for working on computer vision projects. You should also consider breaking apart your problem, and separating different tasks/problems into individual solutions.

As an example, you could start with detecting traffic and keep a history of traffic trends (direction, counts, flow rate, etc.), that’s one solution and generates a lot of data. From there, you can use the trend data to look for anomalies. That might help you find instances of what you’d need for some of the other things you’re looking to provide solutions to. Breaking your problem apart also means you have modular pieces, and it will make it easier to manage. Building a monolith system to solve numerous problems can become a nightmare to maintain, since if one part breaks or needs retraining, it may take a very long time to resolve. Separating the features will allow you to fix individual components much quicker and with less headaches.

I would also recommend taking a look at the Solutions in the Ultralytics documentation. Something like tracking objects in a specific region and/or generating traffic heatmaps might be useful for what you’re attempting to do.

1 Like

Thank you so much for your detailed answer; it has been a great help. As my online name suggests, I’m a YOLO beginner, and I’ve only mastered the basic skills: how to label YOLO datasets and train YOLO models. I may not fully understand the methods you provided, and I still need to learn a lot, but my work doesn’t allow me much time. I also tried asking the AI, and it told me that when labeling things like vehicles running over dashed lines, I need to label both the vehicles and the dashed lines, and then determine whether the vehicles and dashed lines overlap as the standard for vehicle running over dashed lines. I will also try using the methods provided by the AI ​​to train the model. In short, thank you so much for your answer; it’s like a fish finding water for me.

Yep—what the AI told you is the right idea conceptually: YOLO should detect things, and then you use geometry/rules to decide if an event happened (crossed a dashed line, entered emergency lane, wrong-way, etc.). Trying to train a single YOLO11 detector to directly output “illegal lane change” almost always turns into high false-positives because it’s a context + time problem, not a single-frame object problem.

For “vehicle crosses dashed line”, I’d avoid boxing every dash segment (it’s tiny, inconsistent, and labeling is painful). A much more robust path is: detect vehicles with YOLO11, and get lane markings / emergency-lane area as a segmentation mask (either a small custom YOLO11-seg model or even a classical lane-marking method). Then check whether the vehicle box overlaps the lane-marking mask.

Here’s the simplest possible overlap check pattern:

import numpy as np
from ultralytics import YOLO

veh = YOLO("yolo11n.pt")          # vehicle detector (or your trained model)
lane = YOLO("lane_best.pt")       # your YOLO11-seg lane/marking model

v = veh.track(source="video.mp4", persist=True, imgsz=1280)
l = lane.predict(source="video.mp4", imgsz=1280)

boxes = v[0].boxes.xyxy.cpu().numpy().astype(int)
mask = l[0].masks.data[0].cpu().numpy().astype(bool)  # single class mask example

for (x1, y1, x2, y2) in boxes:
    overlap = mask[y1:y2, x1:x2].mean()
    if overlap > 0.03:
        print("vehicle intersects lane marking")

If your dashed lines are small from a drone view, bumping imgsz (and/or training with tiled crops) usually helps a lot; also make sure you include plenty of “hard negatives” (vehicles near lines but not crossing) to reduce false positives. For the “event” side, using tracking + regions is often the quickest route—this is exactly the kind of logic covered in the docs guide on tracking in zones and it pairs well with traffic workflows like the YOLO11 traffic management overview.

If you share one short clip + your class list (what you’re labeling right now), I can suggest whether you should model the road context as segmentation (recommended) or as a few coarse polygons (fastest).

2 Likes

Thank you very much for your advice, which has brought me a lot of surprises in my boring life. I will always keep your reply in my memo. Your advice is very detailed and of great help to me. I will take my time to follow the steps to complete it, but currently I need to focus on another work task, so I may need some time to verify and improve it. As for the video, I don’t have time to record and share it now. Thank you for letting me feel the kindness of the YOLO community forum. This is the first time I post on the Internet, and I will always remember this day and your replies. I also hope to become a technical expert one day. Thank you again!

1 Like