Help Understanding the YOLOv5 Architecture

yutrif · February 17, 2022, 1:52pm

hello, I would like to know more about backbone, neck and head in YOLOv5.

as far as I know, the main structure for yolov5 is like

backbone: CSPDarknet
neck: SPPF
Head: PANet

is this structure only applied for YOLOv5 training or the whole method in YOLOv5 ? (including training and detect)

is there any bounding box structure/ method in YOLOv5 detect ?

Glenn · February 17, 2022, 3:20pm

Honestly backbone, neck, head etc are just some made up terms that people have applied to AI architectures. There’s no right naming scheme, but in general what most people in the Vision AI space typically refer to as a backbone is the portion of the model that reduces the height and width dimension of an image and stretches the channel dimension. In each YOLOv5 yaml you can see the backbone and head separately indicated, with the model just being backbone + head:

github.com

ultralytics/yolov5/blob/master/models/yolov5s.yaml

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16

This file has been truncated. show original

SPPF is just a module in the backbone. You can find it here:

github.com

ultralytics/yolov5/blob/07221f1591c0ce7544bd5abf343dac2987e1614e/models/common.py#L182-L198

      
        
            class SPPF(nn.Module):
                # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
                def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
                    super().__init__()
                    c_ = c1 // 2  # hidden channels
                    self.cv1 = Conv(c1, c_, 1, 1)
                    self.cv2 = Conv(c_ * 4, c2, 1, 1)
                    self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
            
            
    def forward(self, x):
                    x = self.cv1(x)
                    with warnings.catch_warnings():
                        warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
                        y1 = self.m(x)
                        y2 = self.m(y1)
                        return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1))

yutrif · February 18, 2022, 12:27am

is there any diagram with explanation of the latest architecture of yolov5 v6? I see the diagram you send in Slack but in YOLOv5 v6 focus layer changed with conv layer and SPP to SPPF.

Glenn · February 18, 2022, 12:54am

@yutrif for SPPF() details see the PR that I implemented it in here:

github.com/ultralytics/yolov5

Add `SPPF()` layer

ultralytics:master ← ultralytics:add/SPPF

opened 12:13PM - 15 Aug 21 UTC

glenn-jocher

+25 -5

Implements a new cascaded SPP method I came up with today that produces mathemat…ically identical results to SPP with less FLOPs and faster speeds :) Can be used in model yamls to replace SPP: ```yaml [-1, 1, SPP, [1024, [5, 9, 13]]], # original [-1, 1, SPPF, [1024, 5]], # replacement (equivalent to above) ``` or ```yaml [-1, 1, SPP, [1024, [3, 5, 7]]], # original [-1, 1, SPPF, [1024, 3]], # replacement (equivalent to above) ```

SPPF() produces a mathematically identical result to SPP() with less FLOPs:

Profiling results:


# Profile
from utils.torch_utils import profile
from models.common import SPP, SPPF

m1 = SPP(1024, 1024)
m2 = SPPF(1024, 1024)
results = profile(input=torch.randn(16, 1024, 64, 64), ops=[m1, m2], n=100)

Glenn · February 18, 2022, 12:55am

@yutrif also the v6.0 release notes contain good details on the latest architectural updates:

yutrif · February 19, 2022, 4:17am

hello, thanks for the previous answer, I’m currently learning about the SPPF, and I would like to know what is the meaning of Mul and Sigmoid in SPPF?

and can you explain about the number in input (1 x 128 x 80 x 80) and what the meaning number in Conv layer (W= (64 x 128x1x1) and B =64) ? thanks

Glenn · February 19, 2022, 1:38pm

@yutrif sorry buddy, I think there might be some confusion here as the SPPF() module contains no sigmoid ops.

The PyTorch tensor dimensions are BCHW = (batch, channel, height, width). See Conv2d — PyTorch 2.1 documentation for example:

In the simplest case, the output value of the layer with input size (N,C,H,W)

yutrif · February 20, 2022, 9:09am

Hello, thanks for the previous answer. I’m currently learning about YOLOV5 detection. YOLOv5 detect the object using anchor boxes, right?

In this website Anchor Boxes for Object Detection - MATLAB & Simulink.

Is this still relevant as anchor boxes in YOLOv5 ?

if its not what anchor box is being used in yolov5 ?

yutrif · February 20, 2022, 11:06pm

Hello, I want to ask again about SPPF. What is make SPPF difference from SPP. Can I get information something like SPPF and SPP have difference in convolution or something like that (about the flow). Because I need information for my thesis “how SPPF can reduce FLOPs?”

Glenn · February 21, 2022, 11:49am

@yutrif SPPF() and SPP() produce mathematically identical outputs, SPPF() just does it with less FLOPs. The original SPP paper is here:

Both implementations are in YOLOv5 models/common.py:

github.com

ultralytics/yolov5/blob/dbbb57cf0b8004dddb7e3bc4506b68b44b60896a/models/common.py#L166-L198

      
        
            class SPP(nn.Module):
                # Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729
                def __init__(self, c1, c2, k=(5, 9, 13)):
                    super().__init__()
                    c_ = c1 // 2  # hidden channels
                    self.cv1 = Conv(c1, c_, 1, 1)
                    self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)
                    self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
            
            
    def forward(self, x):
                    x = self.cv1(x)
                    with warnings.catch_warnings():
                        warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning
                        return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))
            
            

            
class SPPF(nn.Module):
                # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher
                def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))
                    super().__init__()

This file has been truncated. show original

yutrif · February 21, 2022, 1:56pm

Hello, thanks for the previous answer. I’m currently learning about YOLOV5 detection. YOLOv5 detect the object using anchor boxes, right?

In this website Anchor Boxes for Object Detection - MATLAB & Simulink.

Is this still relevant as anchor boxes in YOLOv5 ?

if its not what anchor box is being used in yolov5 ?

Glenn · February 21, 2022, 2:09pm

Yes, anchor boxes have not changed substantially since YOLOv2, which the MATLAB tutorial covers.

YOLOv5 uses a new Ultralytics algorithm called AutoAnchor for anchor verification and generation before training starts.

Autoanchor will analyse your anchors against your dataset and training settings (like --img-size), and will adjust your anchors as necessary if it determines the original anchors are a poor fit, or if an anchor count was specified in your model.yaml rather than anchor values, i.e.

# Specify anchor count (per layer)
anchors: 3

# --OR-- Specify anchor values manually
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

When generating new anchors, autoanchor first applies a kmeans function against your dataset labels (scaled to your training --img-size), and uses kmeans centroids as initial conditions for a Genetic Evolution (GE) algorithm. The GE algorithm will evolve all anchors for 1000 generations under default settings, using CIoU loss (same regression loss used during training) combined with Best Possible Recall (BPR) as its fitness function.

Notebook example:

No action is required on your part to use autoanchor. If you would like to force manual anchors for any reason, you can skip autoanchor with the --noautoanchor flag:

python train.py --noautoanchor

Good luck and let us know if you have any other questions!

yutrif · February 21, 2022, 8:13pm

thanks for the information but when I training the dataset, I see the in the end of visualize code like this models.yolo.Detect

What inside models.yolo.Detect and what purpose or function for this models.

Glenn · February 22, 2022, 10:20am

@yutrif the Detect() layer is responsible for turning the YOLO feature vectors into detection outputs. You can find it in models/yolo/detect.py as you said:

github.com

ultralytics/yolov5/blob/a936f5f219e7697ea059d6fcf5ead3ac7b206687/models/yolo.py#L33-L82

      
        
            class Detect(nn.Module):
                stride = None  # strides computed during build
                onnx_dynamic = False  # ONNX export parameter
            
            
    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
                    super().__init__()
                    self.nc = nc  # number of classes
                    self.no = nc + 5  # number of outputs per anchor
                    self.nl = len(anchors)  # number of detection layers
                    self.na = len(anchors[0]) // 2  # number of anchors
                    self.grid = [torch.zeros(1)] * self.nl  # init grid
                    self.anchor_grid = [torch.zeros(1)] * self.nl  # init anchor grid
                    self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)
                    self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv
                    self.inplace = inplace  # use in-place ops (e.g. slice assignment)
            
            
    def forward(self, x):
                    z = []  # inference output
                    for i in range(self.nl):
                        x[i] = self.m[i](x[i])  # conv

This file has been truncated. show original

yutrif · February 25, 2022, 1:13am

Thanks for the information but I want to know more about the SPPF, how could SPPF produce less flops ?

what did you modify from SPP so it is become like the new SPPF ?

Glenn · February 25, 2022, 12:48pm

@yutrif SPPF runs convolutions is series, concatenating results rather than running larger kernel convolutions in parallel.

yutrif · March 3, 2022, 11:14pm

Thanks for your information, I have a new question about DetectMultiBackend class and Detection class. do you have any paper or flow diagram how DetectMultiBackend class and Detection class works?

Glenn · March 4, 2022, 8:31am

@yutrif DetectMultiBackend() class allows YOLOv5 models to be initialized from any of our supported export formats and then supports inference using those formats with the forward() method.

github.com

ultralytics/yolov5/blob/63ddb6f0d06f6309aa42bababd08c859197a27af/models/common.py#L278-L292

      
        
            class DetectMultiBackend(nn.Module):
                # YOLOv5 MultiBackend class for python inference on various backends
                def __init__(self, weights='yolov5s.pt', device=None, dnn=False, data=None):
                    # Usage:
                    #   PyTorch:              weights = *.pt
                    #   TorchScript:                    *.torchscript
                    #   ONNX Runtime:                   *.onnx
                    #   ONNX OpenCV DNN:                *.onnx with --dnn
                    #   OpenVINO:                       *.xml
                    #   CoreML:                         *.mlmodel
                    #   TensorRT:                       *.engine
                    #   TensorFlow SavedModel:          *_saved_model
                    #   TensorFlow GraphDef:            *.pb
                    #   TensorFlow Lite:                *.tflite
                    #   TensorFlow Edge TPU:            *_edgetpu.tflite

Formats

YOLOv5 inference is officially supported in 11 formats:

Format	`export.py --include`	Model
PyTorch	-	`yolov5s.pt`
TorchScript	`torchscript`	`yolov5s.torchscript`
ONNX	`onnx`	`yolov5s.onnx`
OpenVINO	`openvino`	`yolov5s_openvino_model/`
TensorRT	`engine`	`yolov5s.engine`
CoreML	`coreml`	`yolov5s.mlmodel`
TensorFlow SavedModel	`saved_model`	`yolov5s_saved_model/`
TensorFlow GraphDef	`pb`	`yolov5s.pb`
TensorFlow Lite	`tflite`	`yolov5s.tflite`
TensorFlow Edge TPU	`edgetpu`	`yolov5s_edgetpu.tflite`
TensorFlow.js	`tfjs`	`yolov5s_web_model/`

You can see exported model Usage examples following an export with export.py:

yutrif · March 4, 2022, 8:11pm

How About Detect class or Detect.py ? do you have any flow diagram or any flow to explain it?

Glenn · March 8, 2022, 12:41pm

@yutrif Detect() class simply converts features to detections. Classification models use Classify() class, which turns features into classifications. BTW, excellent architecture visualization posted today at YOLOv5 6.0 Model Structure · Issue #6885 · ultralytics/yolov5 · GitHub by WZMIAOMIAO