YOLOv11n Pruning

dmrysr · February 14, 2026, 12:22pm

Hello everyone,

I am trying to implement an object detection model on an MCU with a NPU using several YOLO models. Right now, I am trying to implement it with YOLOv11n, however, even though i apply int8 quantization, I cannot achieve the size I want to achieve. My MCU can only accept int8, therefore I cannot use floating point as well.

So, I wanted to explore how to do pruning, what are the important steps and what to keep in mind while doing it. I am quite unsure on which layers to prune and so on.

Thank you for your time and help.

pderrenger · February 15, 2026, 12:56am

On MCUs/NPUs, pruning only helps if it’s structured pruning (removing whole channels/filters so the network is physically smaller). Unstructured pruning (just zeroing individual weights) usually won’t shrink your exported INT8 model or speed it up unless your toolchain has sparse-kernel support. The Ultralytics glossary page on pruning (structured vs unstructured) summarizes this well.

If you’re starting a new deployment, I’d try Ultralytics YOLO YOLO26n first (it’s smaller/faster than YOLO11n) from the YOLO26 docs, and only prune if that’s still too big.

For “what to prune”: in practice you typically don’t prune the very first stem layer or the final detection head, and instead prune repeated conv blocks in the backbone + neck (those usually have the most redundancy). The usual workflow is: train a baseline → apply channel pruning gradually (e.g., small % each round) → fine-tune a bit to recover accuracy → repeat until you hit your size target → export INT8 again.

If you share (1) your target max model size (flash/RAM), (2) your current exported model format/size (e.g., .tflite), and (3) the NPU toolchain (TFLite Micro, vendor SDK, etc.), I can suggest whether pruning will actually move the needle for that stack and what pruning ratio is realistic.

Toxite · February 15, 2026, 11:30am

You can read this

dmrysr · February 15, 2026, 12:40pm

Thank you for your reply.

I am aiming a model size below 2 MB. Right now, my exported YOLOv11 model is 2.6 MB for both full_quant model and int8 model. We are using eIQ Neutron SDK for our model implementation on our board.

I have also tried YOLOv26n but I did not achieve different results on that model either. It is also around 2.6 MB in size.

Toxite · February 16, 2026, 2:55pm

If you want to train a smaller model, you can change this to [0.5, 0.25, 512] and train a new model using the YAML:

github.com/ultralytics/ultralytics

ultralytics/cfg/models/26/yolo26.yaml

4cb6d659b


      
          # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs
          # Model docs: https://docs.ultralytics.com/models/yolo26
          # Task docs: https://docs.ultralytics.com/tasks/detect
          
          # Parameters
          nc: 80 # number of classes
          end2end: True # whether to use end-to-end mode
          reg_max: 1 # DFL bins
          scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n'
            # [depth, width, max_channels]
            n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs
            s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs
            m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs
            l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs
            x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs
          
          # YOLO26n backbone
          backbone:
            # [from, repeats, module, args]
            - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
            - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4

It will make it smaller than 2MB after quantization.

dmrysr · February 18, 2026, 3:29pm

This decreased the size incredibly well. But I have a question, what are the possible values for max_channels. Because I have a rather small dataset to train my model for a specific case and this causes the mAP50-95 value to decrase substantially.

Topic		Replies	Views
YOLO modeling weight pruning Guide Resources yolo , resource , tensorrt	0	236	September 30, 2025
Yolov11 pruning and quantizing YOLO yolo , question	3	2337	April 9, 2025
Yolo11 quantization YOLO yolo , question , support , troubleshooting	1	458	June 19, 2025
I would like to quantize my custom trained model YOLO question	1	1003	January 5, 2025
YOLO to lighter versions conversion for CPU deployement YOLO yolo , question , pytorch , onnx	1	867	June 25, 2025

YOLOv11n Pruning

Related topics