Adding a new head to the YOLO11n model to detect very small objects

Venkat · March 23, 2025, 5:37pm

Hi Everyone,

I’m just trying to add a fourth head to the YOLO11n model for processing a high-resolution feature map(P2) to detect very small objects in the existing model architecture. For this, I added a new extended feature map to the neck and added a new head to process this feature map. I tried this 2 ways, implementing the code directly in python and adding these changes in yolo11.yaml file.

Please find the implementation steps below.

Extended the neck function by adding extra upsample module.
Added a new head module to process the P2 feature map, it consists
of Conv layers, C3K module and a detect module for predictions.
Modified the forward pass method to include the new head.
Load and train the model using the custom model by initialized and
loaded with pretrained weights.

Finally, when I try to load the model, getting the following error.
AttributeError: ‘CustomYOLO11n’ object has no attribute ‘extra_upsample’- in the code.

I tried all aspects, but no luck. It seems that DetectionModel class in YOLO11 dynamically builds the model based on the YAML configuration.
And I don’t understand how to register extra_upsample and p2_head modules into the model architecture.

Then I take a different approach, instead of subclassing DetectionModel, I modified YAML file to add a fourth head and loaded the model using modified YAML but still no luck. Please find the yaml below. Getting “RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 64 for tensor number 1 in the list.”

I’m doing this experiment for my project work, Advanced Driver Monitoring System. I need to add 4 new heads and modifying the neck for multitask learning.

Ultralytics YOLO11 object detection model with P3/8 - P5/32

Parameters

nc: 80 # number of classes
scales: # model compound scaling constants, i.e. ‘model=yolo11n.yaml’ will call yolo11.yaml with scale ‘n’

[depth, width, max_channels]

n: [0.50, 0.25, 1024] # summary: 181 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs

YOLO11n backbone

backbone:

[from, repeats, module, args]

[-1, 1, Conv, [64, 3, 2]] # 0-P1/2
[-1, 1, Conv, [128, 3, 2]] # 1-P2/4
[-1, 2, C3k2, [256, False, 0.25]]
[-1, 1, Conv, [256, 3, 2]] # 3-P3/8
[-1, 2, C3k2, [512, False, 0.25]]
[-1, 1, Conv, [512, 3, 2]] # 5-P4/16
[-1, 2, C3k2, [512, True]]
[-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
[-1, 2, C3k2, [1024, True]]
[-1, 1, SPPF, [1024, 5]] # 9
[-1, 2, C2PSA, [1024]] # 10

YOLO11n head

head:

[-1, 1, nn.Upsample, [None, 2, “nearest”]]
[[-1, 6], 1, Concat, [1]] # cat backbone P4
[-1, 2, C3k2, [512, False]] # 13
[-1, 1, nn.Upsample, [None, 2, “nearest”]]
[[-1, 4], 1, Concat, [1]] # cat backbone P3
[-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
[-1, 1, Conv, [256, 3, 2]]
[[-1, 13], 1, Concat, [1]] # cat head P4
[-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
[-1, 1, Conv, [512, 3, 2]]
[[-1, 10], 1, Concat, [1]] # cat head P5
[-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
[-1, 1, nn.Upsample, [None, 2, “nearest”]] # 23 added
[[-1, 2], 1, Concat, [1]] # cat backbone P2
[-1, 3, C3k2, [128, False]] # 25 (P3/8-very small)
[-1, 1, Conv, [128, 3, 2]] # New Conv for P2
[[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
[[23, 26, 27], 1, Detect, [nc]] # Detect(P2, P3, P4, P5)

Toxite · March 24, 2025, 1:51am

You can check this PR that lets you define your custom module directly in the YAML file.

github.com/ultralytics/ultralytics

Add ability to define custom module in YAML and use them through custom entrypoints

main ← custom_module

opened 10:35AM - 10 Mar 25 UTC

Y-T-G

+4 -0

Since there are users looking for more customization when it comes to custom mod…ules (https://github.com/ultralytics/ultralytics/pull/19609, https://github.com/ultralytics/ultralytics/pull/18909), this 4-line PR provides an alternative that enables a customizable interface for users to define and use custom modules without modifying Ultralytics source code. It takes inspiration from the `download` script feature that Ultralytics utilizes for dataset YAML. For a user to define a custom module, they need to simply add the definition code and optionally the parser code to the model YAML as string: ```yaml nc: 10 backbone: - [-1, 1, SimpleModel, [1, nc]] head: - [0, 1, nn.Identity, []] module: init: | import torch.nn as nn class SimpleModel(nn.Module): def __init__(self, num_classes=10): super().__init__() self.backbone = nn.Sequential( nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1), # (3,640,640) -> (16,640,640) nn.ReLU(), nn.MaxPool2d(2, 2), # (16,640,640) -> (16,320,320) nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1), # (16,320,320) -> (32,320,320) nn.ReLU(), nn.MaxPool2d(2, 2), # (32,320,320) -> (32,160,160) nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1), # (32,160,160) -> (64,160,160) nn.ReLU(), nn.MaxPool2d(2, 2) # (64,160,160) -> (64,80,80) ) self.head = nn.Sequential( nn.Flatten(), # Flatten (64,80,80) -> (64*80*80) nn.Linear(64 * 80 * 80, 128), nn.ReLU(), nn.Linear(128, num_classes) ) def forward(self, x): x = self.backbone(x) x = self.head(x) return x parse: | if m is SimpleModel: c2 = args[0] c1 = ch[f] args = [*args[1:]] ``` The parser has access to the local and global namespace. Although the local namespace isn't mutable, the global namespace is, which enables `init` code to add custom module into the namespace. This will hopefully meet the requirements of the users, whilst keeping the Ultralytics codebase free of additional dependencies. It enables flexibility while avoiding bloat. ## 🛠️ PR Summary <sub>Made with ❤️ by [Ultralytics Actions](https://github.com/ultralytics/actions)<sub> ### 🌟 Summary Enhanced model parsing with support for custom initialization and parsing scripts. 🛠️✨ ### 📊 Key Changes - Added execution of a custom initialization script (`init`) during model parsing. - Introduced support for a custom parsing script (`parse`) to modify arguments dynamically. ### 🎯 Purpose & Impact - **Purpose**: Allows users to inject custom logic into the model parsing process, enabling greater flexibility for advanced use cases. - **Impact**: Developers can now tailor model initialization and argument parsing to suit specific needs, making the framework more adaptable for custom workflows. 🚀

Venkat · March 24, 2025, 6:50am

Thank you for quick inputs.

I will try with this. If possible, please provide one complete example with backbone, neck and heads changes.

Best Regards,
Venkat

BurhanQ · March 24, 2025, 11:43am

I would also suggest reviewing the yolov8-p2.yaml as a reference.

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license

# Ultralytics YOLOv8 object detection model with P2/4 - P5/32 outputs
# Model docs: https://docs.ultralytics.com/models/yolov8
# Task docs: https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]
  s: [0.33, 0.50, 1024]
  m: [0.67, 0.75, 768]
  l: [1.00, 1.00, 512]
  x: [1.00, 1.25, 512]

# YOLOv8.0 backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9

# YOLOv8.0-p2 head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, C2f, [512]] # 12

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 3, C2f, [256]] # 15 (P3/8-small)

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 2], 1, Concat, [1]] # cat backbone P2
  - [-1, 3, C2f, [128]] # 18 (P2/4-xsmall)

  - [-1, 1, Conv, [128, 3, 2]]
  - [[-1, 15], 1, Concat, [1]] # cat head P3
  - [-1, 3, C2f, [256]] # 21 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]] # cat head P4
  - [-1, 3, C2f, [512]] # 24 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]] # cat head P5
  - [-1, 3, C2f, [1024]] # 27 (P5/32-large)

  - [[18, 21, 24, 27], 1, Detect, [nc]] # Detect(P2, P3, P4, P5)

Venkat · March 24, 2025, 2:04pm

Thank you for your inputs. Yeah, I tried `yolov8-p2.yaml’ and yolov8-p6.yaml today and I have no issue. Now, I need to add 3 new heads for Facial Landmark Detection, Face Recognition and DMS states along with Detection head and need to modify the neck for multi-task learning.

I’m little tensed in YOLO11 architecture customization for Advanced DMS tasks. Please guide me to achieve this goal successfully and share your thoughts and set my direction towards it.

Best Regards,
Venkat

BurhanQ · March 25, 2025, 1:32am

You can probably use a YOLO pose model for the facial landmark detection, you’ll just need a dataset to train on. Adding multiple heads like that is not what I would call a simple task, and most people who are doing such changes usually know what they’re doing or are going to figure it out themselves.

It might be worthwhile to consider that using more than one model to accomplish your task could be a means to a solution. Heavy modifications to a YOLO model like you’ve mentioned would likely cause significant slowdowns to inference speeds.

Also, just searching around you might be able find some things that could help you out. I did a quick search on Google and found these:

Venkat · March 25, 2025, 6:36am

Thank you for your inputs.

I was trying with YOLO11 object detection and YOLO11 pose models with DMS datasets. I wanted to use 1 among these 2 for multi-task learning. Now, it is confirmed by your statement that I can go ahead with YOLO11 pose model for Objects and Facial Landmark detection.

As a part of literature collection, I reviewed your shared topics too and final outcome taken a decision to go ahead with YOLO11 and other models like MobileNetV2, SqueezeNet, AlexNet.

The following methods have identified so far in a research review process which are more advanced and feasible DMS deployment solutions.

Using multi-task learning (MTL) CNN architecture for face detection, face recognition and facial analysis (Eye gaze estimation, Head pose estimation, Face occlusions).
Using two-stage CNN, first CNN locates and tracks face and eyes, while the second CNN estimates head pose, eye gaze, and occlusions in a multi-task learning framework. The first stage utilizes the modified YOLO11n version.
Using customized YOLO11n for object and face detection and incorporate Mediapipe Face Landmarker and Dlib’s Face Recognition models as a part of the same solution for various DMS states detection.
Using YOLO11n or any other CNN model (Single-task model) for object and face detection, deploy this model into the target device then use Mediapipe Face Landmarker and Dlib’s Face Recognition libraries as a part of DMS Application development to detect various DMS states (Drowsiness, Distraction, Emotions, and Impairment).

Please share your thoughts on this.

Best Regards,
Venkat

Venkat · March 25, 2025, 6:42am

This is the proposed Advanced DMS solution.

Venkat · March 25, 2025, 9:17am

The standard yolo11n-pose.yaml outputs a 17 keypoints for human pose. For 64 facial landmarks, the output layer needs to modified to handle these points. I changed kpt_shape: [17, 3] to kpt_shape: [64, 2]. Is it enough? or Do we need to add any multi-task loss function for this.

Please share your thoughts on this.

Best Regards,
Venkat

pderrenger · March 26, 2025, 12:36am

Hi Venkat,

Yes, modifying the kpt_shape parameter in the yolo11n-pose.yaml file is the correct approach to change the number of keypoints the model detects. Changing kpt_shape: [17, 3] to kpt_shape: [64, 2] tells the model to predict 64 keypoints, each with 2 dimensions (likely x, y coordinates).

You generally do not need to add a separate multi-task loss function just for changing the number of keypoints within the pose estimation task. The existing pose loss function should handle the regression for the specified number of keypoints defined by kpt_shape. You will, however, need to train the model on a dataset annotated with your 64 facial landmarks.

Good luck with your Advanced DMS project!

Venkat · March 26, 2025, 5:36am

Thank you for your inputs.

Best Regards,
Venkat

Venkat · March 26, 2025, 5:45am

I found a research paper on YOLO Architecture Design during my literature review. Please find the paper details below and share this paper for new joiners.

Paper Title: YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review
Link: [2501.13400] YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review

Best Regards,
Venkat

Venkat · March 26, 2025, 10:16am

Can we still use GhostConv for YOLO11n models for model optimization?

Please share your thoughts.

Best Regards,
Venkat

BurhanQ · March 26, 2025, 11:10am

The GhostConv module is still included. It might be good to reference the YOLOv8-ghost config and experiment with applying similar changes to YOLO11

Venkat · March 26, 2025, 11:33am

Thank you for confirmation.

Best Regards,
Venkat

pderrenger · March 27, 2025, 12:36am

Hi Venkat,

Yes, GhostConv is a supported module within the Ultralytics framework and can be integrated into YOLO11 models, including YOLO11n, by modifying the model’s YAML configuration file. This can potentially help optimize the model by reducing parameters and computational cost.

Let us know if you have further questions!

Venkat · March 30, 2025, 2:32pm

Hi

Modified YAML file for GhostConv module as below.

YOLO11n-ghost backbone

backbone:

[from, repeats, module, args]

[-1, 1, Conv, [64, 3, 2]] # 0-P1/2
[-1, 1, GhostConv, [128, 3, 2]] # 1-P2/4
[-1, 2, C3Ghost, [256, False, 0.25]]
[-1, 1, GhostConv, [256, 3, 2]] # 3-P3/8
[-1, 2, C3Ghost, [512, False, 0.25]]
[-1, 1, GhostConv, [512, 3, 2]] # 5-P4/16
[-1, 2, C3Ghost, [512, True]]
[-1, 1, GhostConv, [1024, 3, 2]] # 7-P5/32
[-1, 2, C3Ghost, [1024, True]]
[-1, 1, SPPF, [1024, 5]] # 9
[-1, 2, C2PSA, [1024]] # 10

YOLO11n head

head:

[-1, 1, nn.Upsample, [None, 2, “nearest”]]
[[-1, 6], 1, Concat, [1]] # cat backbone P4
[-1, 2, C3Ghost, [512, False]] # 13
[-1, 1, nn.Upsample, [None, 2, “nearest”]]
[[-1, 4], 1, Concat, [1]] # cat backbone P3
[-1, 2, C3Ghost, [256, False]] # 16 (P3/8-small)

Added a new head for extra small

[-1, 1, nn.Upsample, [None, 2, “nearest”]]
[[-1, 2], 1, Concat, [1]] # cat backbone P2
[-1, 2, C3Ghost, [128, False]] # 19 (P2/4-xsmall)
[-1, 1, GhostConv, [128, 3, 2]]
[[-1, 16], 1, Concat, [1]] # cat head P3
[-1, 2, C3Ghost, [256, False]] # 22 (P3/8-small)
[-1, 1, GhostConv, [256, 3, 2]]
[[-1, 13], 1, Concat, [1]] # cat head P4
[-1, 2, C3Ghost, [512, False]] # 25 (P4/16-medium)
[-1, 1, GhostConv, [512, 3, 2]]
[[-1, 10], 1, Concat, [1]] # cat head P5
[-1, 2, C3Ghost, [1024, False]] # 28 (P5/32-large)
[[19, 22, 25, 28], 1, Detect, [nc]] # Detect(P2, P3, P4, P5)

However, getting an error
TypeError: empty(): argument ‘size’ failed to unpack the object at pos 2 with error “type must be tuple of ints,but got float”

Please share your inputs.

Best Regards,
Venkat

BurhanQ · March 31, 2025, 12:06pm

Generally it is helpful to post the entire stack trace of the error:

                   from  n    params  module                                       arguments
  0                  -1  1      1856  ultralytics.nn.modules.conv.Conv             [3, 64, 3, 2]
  1                  -1  1     38720  ultralytics.nn.modules.conv.GhostConv        [64, 128, 3, 2]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  
  File "ultralytics/nn/tasks.py", line 1223, in parse_model
    m_ = torch.nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
  
  File "ultralytics/nn/modules/block.py", line 419, in __init__
    super().__init__(c1, c2, n, shortcut, g, e)
  
  File "ultralytics/nn/modules/block.py", line 332, in __init__
    self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
  
  File "ultralytics/nn/modules/block.py", line 332, in <genexpr>
    self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
  
  File "ultralytics/nn/modules/block.py", line 471, in __init__
    self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  
  File "ultralytics/nn/modules/conv.py", line 65, in __init__
    self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  
  File ".venv/lib/site-packages/torch/nn/modules/conv.py", line 447, in __init__
    super().__init__(
  
  File ".venv/lib/site-packages/torch/nn/modules/conv.py", line 134, in __init__
    self.weight = Parameter(torch.empty(

TypeError: empty(): argument 'size' failed to unpack the object at pos 2 with error "type must be tuple of ints,but got float"

This points to [-1, 2, C3Ghost, [256, False, 0.25]] being an issue and specifically that you’ve put 0.25 as the third argument, where it should be a tuple[int] according to the error.

Venkat · March 31, 2025, 2:04pm

Thank you for inputs.

I got your point and didn’t change anything in the YOLO8 GhostConv reference, just updated to YOLO11n. The difference is c2f block and C3k2 block and float to int conversion.

Please share if you have any working prototype for YOLO11n, GhostConv?

Best Regards,
Venkata Rao

pderrenger · April 1, 2025, 1:27pm

Hi Venkat,

While GhostConv modules can theoretically be integrated into YOLO11 architectures by modifying the YAML file, we don’t have an official, pre-validated yolo11n-ghost.yaml prototype readily available to share.

Creating custom architectures like this involves careful tuning of the YAML definition, ensuring all module arguments, channel dimensions, and layer connections are compatible. The errors you encountered suggest potential mismatches in how the GhostConv/C3Ghost modules are defined or connected within the YOLO11 structure compared to their usage in YOLOv8.

Debugging the YAML often involves comparing your structure against the base yolo11n.yaml and potentially referencing how modules are parsed in the codebase, for example, within the parse_model function found in ultralytics/nn/tasks.py (view on GitHub). This can help identify issues like the type error you mentioned.

Good luck with your customization!

Topic		Replies	Views
Modifying yolo11 architecture to have one backbone and 2 necks and heads Discussion yolo , question , support , discussion , code	6	735	August 12, 2025
Yolov5 to yolov11 YOLO yolov5 , question , code , yolo11	18	995	January 16, 2025
Modification on yolo11 for OBB Support obb , question , yolo11	2	197	December 27, 2024
Ultralytics YOLO11 Released 🎉 Discussion showcase , ultralytics-official	1	500	October 1, 2024
YOLO architecture Discussion discussion	10	397	February 13, 2025

Adding a new head to the YOLO11n model to detect very small objects

Ultralytics YOLO11 object detection model with P3/8 - P5/32

Parameters

[depth, width, max_channels]

YOLO11n backbone

[from, repeats, module, args]

YOLO11n head

YOLO11n-ghost backbone

[from, repeats, module, args]

YOLO11n head

Added a new head for extra small

Related topics