How does the training of a object detection dataset happen, with segmentation datasets, and how are the transformation handled

Kallinteris-Andreas · August 15, 2025, 10:18am

In Ultralytics HUB the COCO dataset includes segmentation polygons, not bounding boxes, but it can be used to train object detection dataset

are the polygon masks simply converted to bounding boxes during training
how are the Spatial Transforms Albumentations handled such as arbitrary rotations handled, is the bounding box used for training based on post transformer segmentation mask?

Toxite · August 15, 2025, 10:56pm

They are converted to bounding boxes
Ultralytics’ native affine transform augmentations are performed using polygons if using segmentation dataset for object detection:
ultralytics/ultralytics/data/augment.py at 23d79250e7420945792362f07b8d818320fdce49 · ultralytics/ultralytics · GitHub
But Albumentations uses bounding boxes. It doesn’t support polygons.

Kallinteris-Andreas · August 18, 2025, 10:06am

If Ultralytics’ affine transform is used on a dataset that contains segmentation masks, which is used to train an object detection model, is the conversion from segmentation masks to bounding boxes happening (1) before the affine transform, (2) after the affine transform

This is relevant because it is (2) then the bounding box could potentially be more accurate

note: btw Albumentations does support masks Targets by Transform, though I think it is on dense masks, not polygon masks

Thanks!

BurhanQ · August 18, 2025, 10:52am

Just to help clarify (because I’m confused), are you asking about using segmentation annotations to train a standard “detect” (bounding boxes) model? I presume you’re asking about using segmentation annotations to train a segmentation model, but I want to be 100% certain in my understanding.

Kallinteris-Andreas · August 18, 2025, 11:04am

I am asking about training an object detection model with segmentation annotations

BurhanQ · August 18, 2025, 12:01pm

See the code here:

github.com/ultralytics/ultralytics

ultralytics/data/augment.py

23d79250e


      
          def apply_segments(self, segments: np.ndarray, M: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
              """
              Apply affine transformations to segments and generate new bounding boxes.
          
              This function applies affine transformations to input segments and generates new bounding boxes based on
              the transformed segments. It clips the transformed segments to fit within the new bounding boxes.
          
              Args:
                  segments (np.ndarray): Input segments with shape (N, M, 2), where N is the number of segments and M is the
                      number of points in each segment.
                  M (np.ndarray): Affine transformation matrix with shape (3, 3).
          
              Returns:
                  bboxes (np.ndarray): New bounding boxes with shape (N, 4) in xyxy format.
                  segments (np.ndarray): Transformed and clipped segments with shape (N, M, 2).
          
              Examples:
                  >>> segments = np.random.rand(10, 500, 2)  # 10 segments with 500 points each
                  >>> M = np.eye(3)  # Identity transformation matrix
                  >>> new_bboxes, new_segments = apply_segments(segments, M)

This file has been truncated. show original

Toxite · August 18, 2025, 12:18pm

After the affine transform. And it’s like that to get accurate boxes like you mentioned.

Topic		Replies	Views
Changing bounding boxes to polygons Discussion question	2	291	December 7, 2024
Fine tune a Ultralytics Model with a HuggingFace dataset Support yolo , question , feature , code	3	149	July 11, 2025
How to use Albumentations such as `ToGray` with `Ultralytics.YOLO.train` Discussion yolo , code	4	113	September 1, 2025
Are instance segmentation (yolo) masks cropped by the bbox for inference? Discussion code	5	132	August 11, 2025
Training an instance segmentation model WITHOUT cropping. [Yolov11] Discussion discussion	1	44	September 6, 2025

How does the training of a object detection dataset happen, with segmentation datasets, and how are the transformation handled

Related topics