Replacing the YOLO 11 backbone with ResNet 50

The indices are all wrong. If you’re new to computer vision, you shouldn’t be doing advanced things like these without understanding how they work.

nc: 2

scales:
  n: [0.33, 0.25, 1024]
  s: [0.33, 0.50, 1024]
  m: [0.67, 0.75, 768]
  l: [1.00, 1.00, 1024]
  x: [1.00, 1.25, 512]

backbone:

  # [from, repeats, module, args]

  - [-1, 1, TorchVision, [3, resnet50, DEFAULT, True, 2, True]] 

  - [0, 1, Index, [256, 5]]   # P3/8

  - [0, 1, Index, [512, 6]]   # P4/16

  - [0, 1, Index, [1024, 7]]  # P5/32

head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 2], 1, Concat, [1]] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] #

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 1], 1, Concat, [1]] # cat backbone P3
  - [-1, 2, C3k2, [256, False]] # (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 6], 1, Concat, [1]] # cat head P4
  - [-1, 2, C3k2, [512, False]] # (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 3], 1, Concat, [1]] # cat head P5
  - [-1, 2, C3k2, [1024, True]] # (P5/32-large)

  - [[-7, -4, -1], 1, Detect, [nc]] # Detect(P3, P4, P5)
2 Likes