Yolov11 -obb custom dataloader

William_C · June 26, 2025, 1:53pm

Im using a custom dataloader to train yolo in order to ovveride how yolo reads files as i need to load them from a server rather than the os.

im using a yolo collate function

def yolo_collate_fn(batch):
    imgs = torch.stack([sample['img'] for sample in batch], dim=0)
    cls_list = [sample['cls'] for sample in batch]
    bboxes_list = [sample['bboxes'] for sample in batch]
    img_paths = [sample['img_path'] for sample in batch]
    ori_shapes = [sample['ori_shape'] for sample in batch]

    batch_idx = []
    for i, cls in enumerate(cls_list):
        batch_idx.append(torch.full((cls.shape[0],), i, dtype=torch.long))
    batch_idx = torch.cat(batch_idx, dim=0) if batch_idx else torch.tensor([], dtype=torch.long)

    return {
        'img': imgs,
        'cls': torch.cat(cls_list, dim=0) if cls_list else torch.tensor([], dtype=torch.float32),
        'bboxes': torch.cat(bboxes_list, dim=0) if bboxes_list else torch.tensor([], dtype=torch.float32),
        'batch_idx': batch_idx,
        'img_path': img_paths,
        'im_file': img_paths,
        'ori_shape': ori_shapes,
    }

my bboxes are length 8 to coorespond to xyxyxyxy format

tensor([0.7008, 0.9064, 0.7146, 0.9037, 0.7170, 0.9157, 0.7032, 0.9185])

im training via this manner

overrides = {
    'model': "yolo11n-obb.pt",
    'task': 'detect-obb',    
    'imgsz': 768,
    'epochs': 1,
    'batch': BATCH_SIZE,
    'device': 0,
    'workers': 0,
    'verbose': True,
    'data': 'data.yaml',
}

trainer = Custom_OBBTrainer(overrides=overrides)

trainer.train()

yet, when i train it seems like the loss function is expecting xyhwa format as i get this error

RuntimeError                              Traceback (most recent call last)
~/.conda/envs/py-cv-client-ipython/lib/python3.9/site-packages/ultralytics/utils/loss.py in __call__(self, preds, batch)
    701             batch_idx = batch["batch_idx"].view(-1, 1)
--> 702             targets = torch.cat((batch_idx, batch["cls"].view(-1, 1), batch["bboxes"].view(-1, 5)), 1)
    703             rw, rh = targets[:, 4] * imgsz[0].item(), targets[:, 5] * imgsz[1].item()

RuntimeError: shape '[-1, 5]' is invalid for input of size 64

is it possible to train with a custom dataloader via xyxyxyxy format or should i swap all my data to 5 dim format?

thanks

BurhanQ · June 26, 2025, 2:32pm

I haven’t tried running a custom dataloader, but there are a few things to point you to that should help. First, the xyxyxyxy annotation format is correct, but it’s treated as a segment not a bbox.

github.com/ultralytics/ultralytics

ultralytics/data/utils.py

320d75860


      
                  f.seek(-2, 2)
                  if f.read() != b"\xff\xd9":  # corrupt JPEG
                      ImageOps.exif_transpose(Image.open(im_file)).save(im_file, "JPEG", subsampling=0, quality=100)
                      msg = f"{prefix}{im_file}: corrupt JPEG restored and saved"
          
          # Verify labels
          if os.path.isfile(lb_file):
              nf = 1  # label found
              with open(lb_file, encoding="utf-8") as f:
                  lb = [x.split() for x in f.read().strip().splitlines() if len(x)]
                  if any(len(x) > 6 for x in lb) and (not keypoint):  # is segment
                      classes = np.array([x[0] for x in lb], dtype=np.float32)
                      segments = [np.array(x[1:], dtype=np.float32).reshape(-1, 2) for x in lb]  # (cls, xy1...)
                      lb = np.concatenate((classes.reshape(-1, 1), segments2boxes(segments)), 1)  # (cls, xywh)
                  lb = np.array(lb, dtype=np.float32)
              if nl := len(lb):
                  if keypoint:
                      assert lb.shape[1] == (5 + nkpt * ndim), f"labels require {(5 + nkpt * ndim)} columns each"
                      points = lb[:, 5:].reshape(-1, ndim)[:, :2]
                  else:
                      assert lb.shape[1] == 5, f"labels require 5 columns, {lb.shape[1]} columns detected"

You’ll want to follow the operations for the segments variable in the verify_image_label function (or the function directly). That way you’ll end up with the same input.

For the collate_fn you should check out this line:

github.com/ultralytics/ultralytics

ultralytics/data/dataset.py

320d75860


      
                  new_batch = {}
                  batch = [dict(sorted(b.items())) for b in batch]  # make sure the keys are in the same order
                  keys = batch[0].keys()
                  values = list(zip(*[list(b.values()) for b in batch]))
                  for i, k in enumerate(keys):
                      value = values[i]
                      if k in {"img", "text_feats"}:
                          value = torch.stack(value, 0)
                      elif k == "visuals":
                          value = torch.nn.utils.rnn.pad_sequence(value, batch_first=True)
                      if k in {"masks", "keypoints", "bboxes", "cls", "segments", "obb"}:
                          value = torch.cat(value, 0)
                      new_batch[k] = value
                  new_batch["batch_idx"] = list(new_batch["batch_idx"])
                  for i in range(len(new_batch["batch_idx"])):
                      new_batch["batch_idx"][i] += i  # add target image index for build_targets()
                  new_batch["batch_idx"] = torch.cat(new_batch["batch_idx"], 0)
                  return new_batch
          
          
          class YOLOMultiModalDataset(YOLODataset):

and additionally follow the processing steps there. I think that should help you get closer to where you need to be (if not all the way there).

William_C · June 26, 2025, 3:15pm

still getting the same errors, is there something i need to set in the yaml or elsewhere to tell yolo to use segments vs bboxes, it is still expecting a 5 length tensor in the loss function

ity seems like line 702 of ~ultralytics/utils/loss.py suggests its hard coded to length 5. i dont see a length 8 alternative

BurhanQ · June 27, 2025, 12:36pm

I pointed out that segments are used, but boxes are also used and you can see the conversion in the verify_image_label() function

github.com/ultralytics/ultralytics

ultralytics/data/utils.py

320d75860


      
                      msg = f"{prefix}{im_file}: corrupt JPEG restored and saved"
          
          # Verify labels
          if os.path.isfile(lb_file):
              nf = 1  # label found
              with open(lb_file, encoding="utf-8") as f:
                  lb = [x.split() for x in f.read().strip().splitlines() if len(x)]
                  if any(len(x) > 6 for x in lb) and (not keypoint):  # is segment
                      classes = np.array([x[0] for x in lb], dtype=np.float32)
                      segments = [np.array(x[1:], dtype=np.float32).reshape(-1, 2) for x in lb]  # (cls, xy1...)
                      lb = np.concatenate((classes.reshape(-1, 1), segments2boxes(segments)), 1)  # (cls, xywh)
                  lb = np.array(lb, dtype=np.float32)
              if nl := len(lb):
                  if keypoint:
                      assert lb.shape[1] == (5 + nkpt * ndim), f"labels require {(5 + nkpt * ndim)} columns each"
                      points = lb[:, 5:].reshape(-1, ndim)[:, :2]
                  else:
                      assert lb.shape[1] == 5, f"labels require 5 columns, {lb.shape[1]} columns detected"
                      points = lb[:, 1:]
                  assert points.max() <= 1, f"non-normalized or out of bounds coordinates {points[points > 1]}"
                  assert lb.min() >= 0, f"negative label values {lb[lb < 0]}"

BurhanQ · June 27, 2025, 12:37pm

I think you’ll have to follow the same logic in or reuse the verify_image_label function

BurhanQ · June 27, 2025, 12:42pm

Oh, also there’s the Format class that includes this operation for OBB conversion

github.com/ultralytics/ultralytics

ultralytics/data/augment.py

4e962d7bc


      
          labels["bboxes"] = torch.from_numpy(instances.bboxes) if nl else torch.zeros((nl, 4))
          if self.return_keypoint:
              labels["keypoints"] = (
                  torch.empty(0, 3) if instances.keypoints is None else torch.from_numpy(instances.keypoints)
              )
              if self.normalize:
                  labels["keypoints"][..., 0] /= w
                  labels["keypoints"][..., 1] /= h
          if self.return_obb:
              labels["bboxes"] = (
                  xyxyxyxy2xywhr(torch.from_numpy(instances.segments)) if len(instances.segments) else torch.zeros((0, 5))
              )
          # NOTE: need to normalize obb in xywhr format for width-height consistency
          if self.normalize:
              labels["bboxes"][:, [0, 2]] /= w
              labels["bboxes"][:, [1, 3]] /= h
          # Then we can use collate_fn
          if self.batch_idx:
              labels["batch_idx"] = torch.zeros(nl)
          return labels

It’s used in the build_transforms method of the YOLODataset class

github.com/ultralytics/ultralytics

ultralytics/data/dataset.py

4e962d7bc


      
              (Compose): Composed transforms.
          """
          if self.augment:
              hyp.mosaic = hyp.mosaic if self.augment and not self.rect else 0.0
              hyp.mixup = hyp.mixup if self.augment and not self.rect else 0.0
              hyp.cutmix = hyp.cutmix if self.augment and not self.rect else 0.0
              transforms = v8_transforms(self, self.imgsz, hyp)
          else:
              transforms = Compose([LetterBox(new_shape=(self.imgsz, self.imgsz), scaleup=False)])
          transforms.append(
              Format(
                  bbox_format="xywh",
                  normalize=True,
                  return_mask=self.use_segments,
                  return_keypoint=self.use_keypoints,
                  return_obb=self.use_obb,
                  batch_idx=True,
                  mask_ratio=hyp.mask_ratio,
                  mask_overlap=hyp.overlap_mask,
                  bgr=hyp.bgr if self.augment else 0.0,  # only affect training.
              )

Topic		Replies	Views
Bounding boxes on images during yolo training goes all over the place YOLO	5	51	January 26, 2026
Training custom dataset Discussion question , discussion	7	1174	November 4, 2024
How to reproduce YOLO11n-OBB 78.4 mAP50 on DOTA-v1.0 as shown in the official docs? Discussion yolo , question , troubleshooting , discussion	8	157	November 7, 2025
Yolo very bad results on custom dataset val YOLO	3	556	October 27, 2024
The need for specific bbox formatting Support question , code	3	138	July 3, 2025

Yolov11 -obb custom dataloader

Related topics