How does the training of a object detection dataset happen, with segmentation datasets, and how are the transformation handled

In Ultralytics HUB the COCO dataset includes segmentation polygons, not bounding boxes, but it can be used to train object detection dataset

  1. are the polygon masks simply converted to bounding boxes during training
  2. how are the Spatial Transforms Albumentations handled such as arbitrary rotations handled, is the bounding box used for training based on post transformer segmentation mask?
  1. They are converted to bounding boxes
  2. Ultralytics’ native affine transform augmentations are performed using polygons if using segmentation dataset for object detection:
    ultralytics/ultralytics/data/augment.py at 23d79250e7420945792362f07b8d818320fdce49 · ultralytics/ultralytics · GitHub
    But Albumentations uses bounding boxes. It doesn’t support polygons.

If Ultralytics’ affine transform is used on a dataset that contains segmentation masks, which is used to train an object detection model, is the conversion from segmentation masks to bounding boxes happening (1) before the affine transform, (2) after the affine transform

This is relevant because it is (2) then the bounding box could potentially be more accurate

note: btw Albumentations does support masks Targets by Transform, though I think it is on dense masks, not polygon masks

Thanks!

Just to help clarify (because I’m confused), are you asking about using segmentation annotations to train a standard “detect” (bounding boxes) model? I presume you’re asking about using segmentation annotations to train a segmentation model, but I want to be 100% certain in my understanding.

I am asking about training an object detection model with segmentation annotations

See the code here:

After the affine transform. And it’s like that to get accurate boxes like you mentioned.