Changing bounding boxes to polygons

Hi all,
I’m working on a project which requires me to quickly infer the four corners of a 2d projection of a square surface. Its shape is important to me and I do not want to use an analytical solution after running a bounding-box generating model. I decided to adapt YOLOv5 for this purpose. For that purpose I will need to change the model itself, in order to output 8 numbers (two x,y pairs for each corner) instead of the x,y,w,h standard. I have some questions:

First things first, is this possible? By possible I mean, possible without an unreasonable amount of work to the point where it’d be faster to just write and train a model from scratch.
Second of all, is there perhaps a model you know which is better suited for my task?
Third of all, is there something to look out for when changing yolov5’s output shape, to not break the model.
Thanks for reading this and I’d appreciate any input on this, as I am still very green at object detecting NNs.

Hello there! :blush:

It’s great to see the enthusiasm for adapting YOLO to your project! Let me address your questions step-by-step:

  1. Is this possible?
    Yes, adapting YOLOv5 to output corner coordinates instead of the standard x, y, w, h bounding box format is technically possible. However, this will involve modifying the architecture of the model, including:

    • Head adjustments: You’ll need to redesign the YOLOv5 head to output 8 coordinates for the corners (4 pairs of x, y values) instead of the 4 box parameters.
    • Loss function: Modify the loss function to account for the corner-based ground truth format and ensure proper optimization for your task.
    • Dataset changes: Your training dataset should be annotated with the 4 corners of each object instead of bounding boxes.

    While this is feasible within YOLOv5, it can be fairly complex and would require detailed knowledge of the underlying architecture.

  2. Are there better-suited models for this task?
    Instead of modifying YOLOv5, you might want to explore models explicitly designed for polygon or oriented bounding box (OBB) detection. For example:

    • YOLO11 with OBB support: YOLO11 natively supports oriented bounding boxes, which can be a starting point for predicting four corners of an object.
    • Detectron2: Known for its flexibility, you can customize it for tasks like polygon prediction.
    • DOTA dataset models: Models trained on DOTA (a dataset focused on object detection with OBB) might align well with your needs.
  3. Things to look out for when changing YOLOv5’s output shape:
    When modifying YOLOv5’s output:

    • Anchor management: Check the anchor configurations and adapt them to your new prediction output. This might require a rethinking of how object matching is handled.
    • Backbone compatibility: Many parts of YOLOv5 are optimized for bbox predictions; ensure your changes don’t introduce incompatibilities elsewhere.
    • Inference pipeline: Updating the output structure will also require changes in the post-processing logic that translates raw model outputs into usable results.
    • Validation and debugging: Testing thoroughly on a custom dataset to ensure predictions align with your objectives is critical.

Suggestions for Moving Forward:

  • If this is your first time working with object detection NNs, I recommend starting with a model that already supports OBBs like YOLO11 rather than modifying YOLOv5 heavily. YOLO11 makes it easier to detect rotated objects and work with non-standard bounding box formats.
  • To try YOLO11 for OBB detection, you can follow the YOLO11 training guide for OBB. You might find its existing functionalities sufficient to your needs without requiring major adjustments.

Let me know if you’d like specific examples or need further guidance on any step! :rocket:

1 Like

You could try OBB models with ultralytics.

You can get the four corners from the result. result.obb.xyxyxyxy.

1 Like