Yolo and Comet-ML resulting in error when training

Hello there,

I’m trying to train a YOLO 11 detector with COCO128 dataset, however it seems something is wrong when running the training session along with Comet ML as it gives some kind of error. If I disable Comet ML, it works as intended. I tried with YOLO 8 as well, same issue.

from ultralytics import YOLO
model = YOLO("yolo11n.pt")
results = model.train(
    data="coco128.yaml",
    project="comet-example-yolov8-coco128",
    batch=32,
    save_period=1,
    save_json=True,
    epochs=3
)

I’m getting the following output:

Ultralytics 8.3.64  Python-3.11.4 torch-2.5.1+cu118 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 6144MiB)
engine\trainer: task=detect, mode=train, model=yolo11n.pt, data=coco128.yaml, epochs=3, time=None, patience=100, batch=32, imgsz=640, save=True, save_period=1, cache=False, device=None, workers=8, project=comet-example-yolov8-coco128, name=train3, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=True, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=True, opset=None, workspace=None, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, copy_paste_mode=flip, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=comet-example-yolov8-coco128\train3

                   from  n    params  module                                       arguments                     
  0                  -1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]                 
  1                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]                
  2                  -1  1      6640  ultralytics.nn.modules.block.C3k2            [32, 64, 1, False, 0.25]      
  3                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
  4                  -1  1     26080  ultralytics.nn.modules.block.C3k2            [64, 128, 1, False, 0.25]     
  5                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
  6                  -1  1     87040  ultralytics.nn.modules.block.C3k2            [128, 128, 1, True]           
  7                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]              
  8                  -1  1    346112  ultralytics.nn.modules.block.C3k2            [256, 256, 1, True]           
  9                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]                 
 10                  -1  1    249728  ultralytics.nn.modules.block.C2PSA           [256, 256, 1]                 
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 12             [-1, 6]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 13                  -1  1    111296  ultralytics.nn.modules.block.C3k2            [384, 128, 1, False]          
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 15             [-1, 4]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  1     32096  ultralytics.nn.modules.block.C3k2            [256, 64, 1, False]           
 17                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]                
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  1     86720  ultralytics.nn.modules.block.C3k2            [192, 128, 1, False]          
 20                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]              
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 22                  -1  1    378880  ultralytics.nn.modules.block.C3k2            [384, 256, 1, True]           
 23        [16, 19, 22]  1    464912  ultralytics.nn.modules.head.Detect           [80, [64, 128, 256]]          
YOLO11n summary: 319 layers, 2,624,080 parameters, 2,624,064 gradients, 6.6 GFLOPs

Transferred 499/499 items from pretrained weights
Freezing layer 'model.23.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks...
AMP: checks passed 
train: Scanning D:\ml\yolo11code\datasets\coco128\labels\train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|
val: Scanning D:\ml\yolo11code\datasets\coco128\labels\train2017.cache... 126 images, 2 backgrounds, 0 corrupt: 100%|██
Plotting labels to comet-example-yolov8-coco128\train3\labels.jpg... 
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to comet-example-yolov8-coco128\train3
Starting training for 3 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/3      6.64G       1.16      1.346      1.212        536        640: 100%|██████████| 4/4 [00:04<00:00,  1.07
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:05<0
                   all        128        929      0.644      0.597      0.674      0.507

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 results = model.train(
      2     data="coco128.yaml",
      3     project="comet-example-yolov8-coco128",
      4     batch=32,
      5     save_period=1,
      6     save_json=True,
      7     epochs=3
      8 )

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\model.py:806, in Model.train(self, trainer, **kwargs)
    803     self.model = self.trainer.model
    805 self.trainer.hub_session = self.session  # attach optional HUB session
--> 806 self.trainer.train()
    807 # Update model and cfg after training
    808 if RANK in {-1, 0}:

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\trainer.py:207, in BaseTrainer.train(self)
    204         ddp_cleanup(self, str(file))
    206 else:
--> 207     self._do_train(world_size)

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\trainer.py:454, in BaseTrainer._do_train(self, world_size)
    452     self.scheduler.last_epoch = self.epoch  # do not move
    453     self.stop |= epoch >= self.epochs  # stop if exceeded epochs
--> 454 self.run_callbacks("on_fit_epoch_end")
    455 self._clear_memory()
    457 # Early Stopping

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\engine\trainer.py:168, in BaseTrainer.run_callbacks(self, event)
    166 """Run all existing callbacks associated with a particular event."""
    167 for callback in self.callbacks.get(event, []):
--> 168     callback(self)

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:360, in on_fit_epoch_end(trainer)
    358     _log_confusion_matrix(experiment, trainer, curr_step, curr_epoch)
    359 if _should_log_image_predictions():
--> 360     _log_image_predictions(experiment, trainer.validator, curr_step)

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:263, in _log_image_predictions(experiment, validator, curr_step)
    260     return
    262 image_path = Path(image_path)
--> 263 annotations = _fetch_annotations(
    264     img_idx,
    265     image_path,
    266     batch,
    267     predictions_metadata_map,
    268     class_label_map,
    269 )
    270 _log_images(
    271     experiment,
    272     [image_path],
    273     curr_step,
    274     annotations=annotations,
    275 )
    276 _comet_image_prediction_count += 1

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:194, in _fetch_annotations(img_idx, image_path, batch, prediction_metadata_map, class_label_map)
    190 """Join the ground truth and prediction annotations if they exist."""
    191 ground_truth_annotations = _format_ground_truth_annotations_for_detection(
    192     img_idx, image_path, batch, class_label_map
    193 )
--> 194 prediction_annotations = _format_prediction_annotations_for_detection(
    195     image_path, prediction_metadata_map, class_label_map
    196 )
    198 annotations = [
    199     annotation for annotation in [ground_truth_annotations, prediction_annotations] if annotation is not None
    200 ]
    201 return [annotations] if annotations else None

File ~\AppData\Roaming\Python\Python311\site-packages\ultralytics\utils\callbacks\comet.py:182, in _format_prediction_annotations_for_detection(image_path, metadata, class_label_map)
    180     cls_label = prediction["category_id"]
    181     if class_label_map:
--> 182         cls_label = str(class_label_map[cls_label])
    184     data.append({"boxes": [boxes], "label": cls_label, "score": score})
    186 return {"name": "prediction", "data": data}

KeyError: 80

Try removing save_json=True

1 Like

It did the trick!

Thank you so much.

1 Like

Hello dourado,

It appears you’ve encountered a KeyError: 80 during training with Comet ML integration. This typically indicates an issue related to the class label mapping, specifically when Comet ML tries to map a predicted class ID that is not within the expected range of your dataset’s class labels.

Based on the traceback, the error occurs in the Comet ML callback functions, specifically during the logging of image predictions.

To solve this, ensure that your dataset’s class labels defined in coco128.yaml are correct. Comet uses class_label_map that is based on it. Double-check the dataset configuration and make sure that it defines 80 classes, and the mapping from class index to class name is accurate.

I hope this helps.