NMS input shape expected by detect/val.py after custom head, is it (B,4+nc,N) or (B,N,4+nc)?

I use YOLO11n. I modified the detection head + tasks.py output. Training/val runs without crashing, but mAP/precision/recall are all 0, which makes me suspect my prediction tensor layout doesn’t match what val.py’s NMS expects.

Can someone confirm the expected prediction layout right before NMS in ultralytics/models/yolo/detect/val.py? I want to confirm the expected format (boxes + class scores) and where the transpose/permutation should happen.

Debug prints I see right before NMS:

  • Starting training for 1 epochs…

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/1         0G      5.865      8.659      6.368         30        256: 100% ━━━━━━━━━━━━ 1/1 0.5it/s 1.9s
    

    WARNING [HEAD-DEBUG] y shape=(4, 84, 1701)
    WARNING [VAL-DEBUG] preds[0] type=<class ‘torch.Tensor’> shape=torch.Size([4, 84, 1701])
    [NMS-DEBUG] total_det=1200 conf_max=480.046 conf_thres=0.001 iou_thres=0.7
    [METRICS-DEBUG] batch_seen=1 img_idx=0 GT=1 PRED(afterNMS)=300 bestIoUmax=0.377 hits@0.5=0 conf_max=480.046
    [METRICS-DEBUG] batch_seen=2 img_idx=1 GT=5 PRED(afterNMS)=300 bestIoUmax=0.123 hits@0.5=0 conf_max=480.046
    [METRICS-DEBUG] batch_seen=3 img_idx=2 GT=9 PRED(afterNMS)=300 bestIoUmax=0.132 hits@0.5=0 conf_max=480.046
    [METRICS-DEBUG] batch_seen=4 img_idx=3 GT=2 PRED(afterNMS)=300 bestIoUmax=0.455 hits@0.5=0 conf_max=480.046
    Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 1/1 0.6it/s 1.5s
    [METRICS-DEBUG][SUMMARY] total_GT=17 total_PRED(afterNMS)=1200 total_hits@0.5=0
    all 4 17 0 0 0 0

    1 epochs completed in 0.001 hours.

(batch_size, 4 coordinates (cxcywh) + class_scores, num_anchors)

1 Like

Thanks !

No problem.

For the standard Ultralytics YOLO detect validator path, the tensor passed into NMS is shaped (B, 4 + nc, N), where the 4 are decoded cxcywh boxes and the nc are class probabilities (after sigmoid), with N anchors/predictions.

If you’re using the end-to-end Detect.postprocess() helper, note it expects the permuted layout (B, N, 4 + nc) as documented in the Detect.postprocess reference in ultralytics/nn/modules/head.py.

Also, your conf_max=480 debug value is a strong sign you’re feeding raw logits (missing sigmoid() on class scores) into NMS/metrics, which will nuke mAP even if shapes are right.

Thanks! I will check the sigmoid if it present, i did see the mAP drop around 5% - 7%.