What i miss when i modify the yolo11 structure?

I feel deep sorry to bother everybody again,but i do debug whole afternoon ,but still don’t know how

here is problem:

I change the last layer,from ‘Detect’ to ‘Detect_DyHead’

 # Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 181 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 181 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 231 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 357 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 357 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 13

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect_DyHead, [nc,128,1]] # Detect(P3, P4, P5)

Task.py add below code

    elif m in {Detect_DyHead}:
        args.append([ch[x] for x in f])
        # print(args)

Detect_DyHead definition as below:

class Detect_DyHead(nn.Module):

dynamic = False  # force grid reconstruction
export = False  # export mode
shape = None
anchors = torch.empty(0)  # init
strides = torch.empty(0)  # init

def __init__(self, nc=80, hidc=256, block_num=2, ch=()):  # detection layer
    super().__init__()
    self.nc = nc  # number of classes
    self.nl = len(ch)  # number of detection layers
    self.reg_max = 16  # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)
    self.no = nc + self.reg_max * 4  # number of outputs per anchor
    self.stride = torch.zeros(self.nl)  # strides computed during build
    c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], self.nc)  # channels
    self.conv = nn.ModuleList(nn.Sequential(Conv(x, hidc, 1)) for x in ch)
    self.dyhead = nn.Sequential(*[DyHeadBlock(hidc) for i in range(block_num)])
    self.cv2 = nn.ModuleList(
        nn.Sequential(Conv(hidc, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for _ in ch)
    self.cv3 = nn.ModuleList(nn.Sequential(Conv(hidc, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for _ in ch)
    self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity()

def forward(self, x):

    # print(x[0].shape)
    # print(x[1].shape)
    # print(x[2].shape)
    """Concatenates and returns predicted bounding boxes and class probabilities."""
    for i in range(self.nl):
        x[i] = self.conv[i](x[i])
    x = self.dyhead(x)
    shape = x[0].shape  # BCHW
    for i in range(self.nl):
        x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)

      if self.training:
          # print(x[0].shape)
          # print(x[1].shape)
          # print(x[2].shape)
          # print(x)
          return x
    elif self.dynamic or self.shape != shape:
        self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
        self.shape = shape

    x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
    if self.export and self.format in ('saved_model', 'pb', 'tflite', 'edgetpu', 'tfjs'):  # avoid TF FlexSplitV ops
        box = x_cat[:, :self.reg_max * 4]
        cls = x_cat[:, self.reg_max * 4:]
    else:
        box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
    dbox = dist2bbox(self.dfl(box), self.anchors.unsqueeze(0), xywh=True, dim=1) * self.strides
    y = torch.cat((dbox, cls.sigmoid()), 1)
    return y if self.export else (y, x)

def bias_init(self):
    """Initialize Detect() biases, WARNING: requires stride availability."""
    m = self  # self.model[-1]  # Detect() module
    # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1
    # ncf = math.log(0.6 / (m.nc - 0.999999)) if cf is None else torch.log(cf / cf.sum())  # nominal class frequency
    for a, b, s in zip(m.cv2, m.cv3, m.stride):  # from
        a[-1].bias.data[:] = 1.0  # box
        b[-1].bias.data[:m.nc] = math.log(5 / m.nc / (640 / s) ** 2)  # cls (.01 objects, 80 classes, 640 img)

for I remerber the output of forward function during the train is the same shape of yolo11 originer ,i dedug about that:

        if self.training:
          # print(x[0].shape)
          # print(x[1].shape)
          # print(x[2].shape)
          # print(x)
          return x 

but i got somthing wrong when i train: i only got cls loss, other loss miss

it seem the code never be execute in the :

Bbox loss

    if fg_mask.sum():
        target_bboxes /= stride_tensor
        loss[0], loss[2] = self.bbox_loss(
            pred_distri, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask
        )

from loss.py --call–

def __call__(self, preds: Any, batch: Dict[str, torch.Tensor]) -> Tuple[torch.Tensor, torch.Tensor]:
    """Calculate the sum of the loss for box, cls and dfl multiplied by batch size."""
    loss = torch.zeros(3, device=self.device)  # box, cls, dfl
    # feats = preds[1] if isinstance(preds, tuple) else preds
    # print(type(preds))
    if isinstance(preds, tuple):
        feats=preds[1]
    else:
        feats=preds

    pred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split(
        (self.reg_max * 4, self.nc), 1
    )

    pred_scores = pred_scores.permute(0, 2, 1).contiguous()
    pred_distri = pred_distri.permute(0, 2, 1).contiguous()

    dtype = pred_scores.dtype
    batch_size = pred_scores.shape[0]
    imgsz = torch.tensor(feats[0].shape[2:], device=self.device, dtype=dtype) * self.stride[0]  # image size (h,w)
    anchor_points, stride_tensor = make_anchors(feats, self.stride, 0.5)

    # Targets
    targets = torch.cat((batch["batch_idx"].view(-1, 1), batch["cls"].view(-1, 1), batch["bboxes"]), 1)
    targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])
    gt_labels, gt_bboxes = targets.split((1, 4), 2)  # cls, xyxy
    mask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0.0)

    # Pboxes
    pred_bboxes = self.bbox_decode(anchor_points, pred_distri)  # xyxy, (b, h*w, 4)
    # dfl_conf = pred_distri.view(batch_size, -1, 4, self.reg_max).detach().softmax(-1)
    # dfl_conf = (dfl_conf.amax(-1).mean(-1) + dfl_conf.amax(-1).amin(-1)) / 2

    _, target_bboxes, target_scores, fg_mask, _ = self.assigner(
        # pred_scores.detach().sigmoid() * 0.8 + dfl_conf.unsqueeze(-1) * 0.2,
        pred_scores.detach().sigmoid(),
        (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),
        anchor_points * stride_tensor,
        gt_labels,
        gt_bboxes,
        mask_gt,
    )

    target_scores_sum = max(target_scores.sum(), 1)

    # Cls loss
    # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum  # VFL way
    loss[1] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum  # BCE
    # print(loss[1] )
    # Bbox loss
    if fg_mask.sum():
        target_bboxes /= stride_tensor
        loss[0], loss[2] = self.bbox_loss(
            pred_distri, pred_bboxes, anchor_points, target_bboxes, target_scores, target_scores_sum, fg_mask
        )

    loss[0] *= self.hyp.box  # box gain
    loss[1] *= self.hyp.cls  # cls gain
    loss[2] *= self.hyp.dfl  # dfl gain

    return loss * batch_size, loss.detach()  # loss(box, cls, dfl)

i don’t know what i miss ,I need your help :heart_hands:

Thanks for sharing the details — you’re very close. The reason you only see cls loss is that no positives are being assigned (fg_mask.sum() == 0). With custom heads this almost always happens because the model strides and anchors are not initialized, so targets/anchors don’t line up and the assigner returns no matches.

Root cause
Your Detect_DyHead does not subclass Detect, so YOLO’s model builder never runs stride discovery or bias_init() for it. In DetectionModel, stride initialization only triggers for Detect subclasses, which means your model.stride is wrong and anchor generation/target scaling break, resulting in empty fg_mask. You can see this behavior in the stride init path described in the DetectionModel reference under stride initialization and bias init logic in the Detect branch.

What to change
Make your head inherit from Detect so it’s treated like a standard YOLO head during build:

from ultralytics.nn.modules.head import Detect

class Detect_DyHead(Detect):
    def __init__(self, nc=80, hidc=256, block_num=2, ch=()):
        super().__init__(nc, ch)  # ensures stride discovery + bias_init paths work
        c2 = max(16, ch[0] // 4, self.reg_max * 4)
        c3 = max(ch[0], min(self.nc, 100))
        self.conv = nn.ModuleList(Conv(x, hidc, 1) for x in ch)
        self.dyhead = nn.Sequential(*[DyHeadBlock(hidc) for _ in range(block_num)])
        self.cv2 = nn.ModuleList(nn.Sequential(Conv(hidc, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4*self.reg_max, 1)) for _ in ch)
        self.cv3 = nn.ModuleList(nn.Sequential(Conv(hidc, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, self.nc, 1)) for _ in ch)

    def forward(self, x):
        for i in range(self.nl):
            x[i] = self.conv[i](x[i])
        x = self.dyhead(x)
        for i in range(self.nl):
            x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
        return x if self.training else super().forward(x)  # training: list[P3,P4,P5]; inference: Detect path

Quick sanity checks

  • After model build, print(model.stride) should be tensor([8., 16., 32.]) — not [32] or zeros. This is required for correct anchor/target scaling as used in the loss path.
  • Training output per level must be [B, 4*reg_max + nc, H, W] (raw logits for the 4*reg_max reg channels — do not apply DFL before loss).
  • The loss only computes bbox/DFL when fg_mask.sum()>0, as shown in the v8DetectionLoss call path; once strides are correct you should see non-zero box/DFL losses.

References

  • The detection head training/inference contract is described in the Detect head reference, which expects per-level maps in training.
  • The v8DetectionLoss call shows fg_mask gating of bbox/DFL computation and how anchors/strides are used.

If you’d prefer not to subclass, you must modify DetectionModel to include your head in the stride discovery and call m.bias_init(), but subclassing Detect is the cleanest fix.

Thank you for your assistance. My gratitude is beyond words - everything is working fine now.

Logging results to C:\Users\Hunger\Desktop\ultralytics\runs\detect\test20251022_16_562
Starting training for 10 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
       1/10      1.55G      2.817      3.772      3.027         20        640: 100% ━━━━━━━━━━━━ 2400/2400 5.5it/s 7:18
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 53/53 6.2it/s 8.5s
                   all        844       2982      0.218      0.164     0.0829     0.0362

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size