NMS input shape expected by detect/val.py after custom head, is it (B,4+nc,N) or (B,N,4+nc)?

I use YOLO11n. I modified the detection head + tasks.py output. Training/val runs without crashing, but mAP/precision/recall are all 0, which makes me suspect my prediction tensor layout doesn’t match what val.py’s NMS expects.

Can someone confirm the expected prediction layout right before NMS in ultralytics/models/yolo/detect/val.py? I want to confirm the expected format (boxes + class scores) and where the transpose/permutation should happen.

Debug prints I see right before NMS:

  • Starting training for 1 epochs…

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
        1/1         0G      5.865      8.659      6.368         30        256: 100% ━━━━━━━━━━━━ 1/1 0.5it/s 1.9s
    

    WARNING [HEAD-DEBUG] y shape=(4, 84, 1701)
    WARNING [VAL-DEBUG] preds[0] type=<class ‘torch.Tensor’> shape=torch.Size([4, 84, 1701])
    [NMS-DEBUG] total_det=1200 conf_max=480.046 conf_thres=0.001 iou_thres=0.7
    [METRICS-DEBUG] batch_seen=1 img_idx=0 GT=1 PRED(afterNMS)=300 bestIoUmax=0.377 hits@0.5=0 conf_max=480.046
    [METRICS-DEBUG] batch_seen=2 img_idx=1 GT=5 PRED(afterNMS)=300 bestIoUmax=0.123 hits@0.5=0 conf_max=480.046
    [METRICS-DEBUG] batch_seen=3 img_idx=2 GT=9 PRED(afterNMS)=300 bestIoUmax=0.132 hits@0.5=0 conf_max=480.046
    [METRICS-DEBUG] batch_seen=4 img_idx=3 GT=2 PRED(afterNMS)=300 bestIoUmax=0.455 hits@0.5=0 conf_max=480.046
    Class Images Instances Box(P R mAP50 mAP50-95): 100% ━━━━━━━━━━━━ 1/1 0.6it/s 1.5s
    [METRICS-DEBUG][SUMMARY] total_GT=17 total_PRED(afterNMS)=1200 total_hits@0.5=0
    all 4 17 0 0 0 0

    1 epochs completed in 0.001 hours.

(batch_size, 4 coordinates (cxcywh) + class_scores, num_anchors)