Yolo and Comet-ML resulting in error when training

Hello there,

I’m trying to train a YOLO 11 detector with COCO128 dataset, however it seems something is wrong when running the training session along with Comet ML as it gives some kind of error. If I disable Comet ML, it works as intended. I tried with YOLO 8 as well, same issue.

from ultralytics import YOLO
model = YOLO("yolo11n.pt")
results = model.train(
    data="coco128.yaml",
    project="comet-example-yolov8-coco128",
    batch=32,
    save_period=1,
    save_json=True,
    epochs=3
)

I’m getting the following output:

Ultralytics 8.3.64  Python-3.11.4 torch-2.5.1+cu118 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 6144MiB)
engine\trainer: task=detect, mode=train, model=yolo11n.pt, data=coco128.yaml, epochs=3, time=None, patience=100, batch=32, imgsz=640, save=True, save_period=1, cache=False, device=None, workers=8, project=comet-example-yolov8-coco128, name=train3, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=True, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=comet-example-yolov8-coco128\train3

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      6640  ultralytics.nn.modules.block.C3k2            [32, 64, 1, False, 0.25]      
  3                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
  4                  -1  1     26080  ultralytics.nn.modules.block.C3k2            [64, 128, 1, False, 0.25]     
  5                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
  6                  -1  1     87040  ultralytics.nn.modules.block.C3k2            [128, 128, 1, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  8                  -1  1    346112  ultralytics.nn.modules.block.C3k2            [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 
 10                  -1  1    249728  ultralytics.nn.modules.block.C2PSA           [256, 256, 1]                 
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 12             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 13                  -1  1    111296  ultralytics.nn.modules.block.C3k2            [384, 128, 1, False]          
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 15             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  1     32096  ultralytics.nn.modules.block.C3k2            [256, 64, 1, False]           
 17                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  1     86720  ultralytics.nn.modules.block.C3k2            [192, 128, 1, False]          
 20                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 22                  -1  1    378880  ultralytics.nn.modules.block.C3k2            [384, 256, 1, True]           
 23        [16, 19, 22]  1    464912  ultralytics.nn.modules.head.Detect           [80, [64, 128, 256]]          
YOLO11n summary: 319 layers, 2,624,080 parameters, 2,624,064 gradients, 6.6 GFLOPs

Transferred 499/499 items from pretrained weights
Freezing layer 'model.23.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed 
train: Scanning D:\ml\yolo11code\datasets\coco128\labels\train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|
val: Scanning D:\ml\yolo11code\datasets\coco128\labels\train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██
Plotting labels to comet-example-yolov8-coco128\train3\labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to comet-example-yolov8-coco128\train3
Starting training for 3 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/3      6.64G       1.16      1.346      1.212        536        640: 100%|██████████| 4/4 [00:04<00:00,  1.07
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:05<0
                   all        128        929      0.644      0.597      0.674      0.507

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 results = model.train(
      2     data="coco128.yaml",
      3     project="comet-example-yolov8-coco128",
      4     batch=32,
      5     save_period=1,
      6     save_json=True,
      7     epochs=3
      8 )

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\model.py:806, in Model.train(self, trainer, **kwargs)
    803     self.model = self.trainer.model
    805 self.trainer.hub_session = self.session  # attach optional HUB session
--> 806 self.trainer.train()
    807 # Update model and cfg after training
    808 if RANK in {-1, 0}:

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\trainer.py:207, in BaseTrainer.train(self)
    204         ddp_cleanup(self, str(file))
    206 else:
--> 207     self._do_train(world_size)

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\trainer.py:454, in BaseTrainer._do_train(self, world_size)
    452     self.scheduler.last_epoch = self.epoch  # do not move
    453     self.stop |= epoch >= self.epochs  # stop if exceeded epochs
--> 454 self.run_callbacks("on_fit_epoch_end")
    455 self._clear_memory()
    457 # Early Stopping

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\trainer.py:168, in BaseTrainer.run_callbacks(self, event)
    166 """Run all existing callbacks associated with a particular event."""
    167 for callback in self.callbacks.get(event, []):
--> 168     callback(self)

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:360, in on_fit_epoch_end(trainer)
    358     _log_confusion_matrix(experiment, trainer, curr_step, curr_epoch)
    359 if _should_log_image_predictions():
--> 360     _log_image_predictions(experiment, trainer.validator, curr_step)

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:263, in _log_image_predictions(experiment, validator, curr_step)
    260     return
    262 image_path = Path(image_path)
--> 263 annotations = _fetch_annotations(
    264     img_idx,
    265     image_path,
    266     batch,
    267     predictions_metadata_map,
    268     class_label_map,
    269 )
    270 _log_images(
    271     experiment,
    272     [image_path],
    273     curr_step,
    274     annotations=annotations,
    275 )
    276 _comet_image_prediction_count += 1

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:194, in _fetch_annotations(img_idx, image_path, batch, prediction_metadata_map, class_label_map)
    190 """Join the ground truth and prediction annotations if they exist."""
    191 ground_truth_annotations = _format_ground_truth_annotations_for_detection(
    192     img_idx, image_path, batch, class_label_map
    193 )
--> 194 prediction_annotations = _format_prediction_annotations_for_detection(
    195     image_path, prediction_metadata_map, class_label_map
    196 )
    198 annotations = [
    199     annotation for annotation in [ground_truth_annotations, prediction_annotations] if annotation is not None
    200 ]
    201 return [annotations] if annotations else None

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:182, in _format_prediction_annotations_for_detection(image_path, metadata, class_label_map)
    180     cls_label = prediction["category_id"]
    181     if class_label_map:
--> 182         cls_label = str(class_label_map[cls_label])
    184     data.append({"boxes": [boxes], "label": cls_label, "score": score})
    186 return {"name": "prediction", "data": data}

KeyError: 80

Try removing save_json=True

1 Like

It did the trick!

Thank you so much.