Help Understanding the YOLOv5 Architecture

hello, I would like to know more about backbone, neck and head in YOLOv5.

as far as I know, the main structure for yolov5 is like

backbone: CSPDarknet
neck: SPPF
Head: PANet

is this structure only applied for YOLOv5 training or the whole method in YOLOv5 ? (including training and detect)

is there any bounding box structure/ method in YOLOv5 detect ?

1 Like

Honestly backbone, neck, head etc are just some made up terms that people have applied to AI architectures. There’s no right naming scheme, but in general what most people in the Vision AI space typically refer to as a backbone is the portion of the model that reduces the height and width dimension of an image and stretches the channel dimension. In each YOLOv5 yaml you can see the backbone and head separately indicated, with the model just being backbone + head:

SPPF is just a module in the backbone. You can find it here:

is there any diagram with explanation of the latest architecture of yolov5 v6? I see the diagram you send in Slack but in YOLOv5 v6 focus layer changed with conv layer and SPP to SPPF.

1 Like

@yutrif for SPPF() details see the PR that I implemented it in here:

SPPF() produces a mathematically identical result to SPP() with less FLOPs:

Profiling results:


# Profile
from utils.torch_utils import profile
from models.common import SPP, SPPF

m1 = SPP(1024, 1024)
m2 = SPPF(1024, 1024)
results = profile(input=torch.randn(16, 1024, 64, 64), ops=[m1, m2], n=100)

@yutrif also the v6.0 release notes contain good details on the latest architectural updates:

hello, thanks for the previous answer, I’m currently learning about the SPPF, and I would like to know what is the meaning of Mul and Sigmoid in SPPF?

and can you explain about the number in input (1 x 128 x 80 x 80) and what the meaning number in Conv layer (W= (64 x 128x1x1) and B =64) ? thanks

@yutrif sorry buddy, I think there might be some confusion here as the SPPF() module contains no sigmoid ops.

The PyTorch tensor dimensions are BCHW = (batch, channel, height, width). See Conv2d β€” PyTorch 2.1 documentation for example:

In the simplest case, the output value of the layer with input size (N,C,H,W)

Hello, thanks for the previous answer. I’m currently learning about YOLOV5 detection. YOLOv5 detect the object using anchor boxes, right?

In this website Anchor Boxes for Object Detection - MATLAB & Simulink.

Is this still relevant as anchor boxes in YOLOv5 ?

if its not what anchor box is being used in yolov5 ?

Hello, I want to ask again about SPPF. What is make SPPF difference from SPP. Can I get information something like SPPF and SPP have difference in convolution or something like that (about the flow). Because I need information for my thesis β€œhow SPPF can reduce FLOPs?”

@yutrif SPPF() and SPP() produce mathematically identical outputs, SPPF() just does it with less FLOPs. The original SPP paper is here:

Both implementations are in YOLOv5 models/common.py:

Hello, thanks for the previous answer. I’m currently learning about YOLOV5 detection. YOLOv5 detect the object using anchor boxes, right?

In this website Anchor Boxes for Object Detection - MATLAB & Simulink.

Is this still relevant as anchor boxes in YOLOv5 ?

if its not what anchor box is being used in yolov5 ?

Yes, anchor boxes have not changed substantially since YOLOv2, which the MATLAB tutorial covers.

YOLOv5 :rocket: uses a new Ultralytics algorithm called AutoAnchor for anchor verification and generation before training starts.

Autoanchor will analyse your anchors against your dataset and training settings (like --img-size), and will adjust your anchors as necessary if it determines the original anchors are a poor fit, or if an anchor count was specified in your model.yaml rather than anchor values, i.e.

# Specify anchor count (per layer)
anchors: 3

# --OR-- Specify anchor values manually
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

When generating new anchors, autoanchor first applies a kmeans function against your dataset labels (scaled to your training --img-size), and uses kmeans centroids as initial conditions for a Genetic Evolution (GE) algorithm. The GE algorithm will evolve all anchors for 1000 generations under default settings, using CIoU loss (same regression loss used during training) combined with Best Possible Recall (BPR) as its fitness function.

Notebook example: Open In Colab Open In Kaggle

No action is required on your part to use autoanchor. If you would like to force manual anchors for any reason, you can skip autoanchor with the --noautoanchor flag:

python train.py --noautoanchor

Good luck :four_leaf_clover: and let us know if you have any other questions!

thanks for the information but when I training the dataset, I see the in the end of visualize code like this models.yolo.Detect

What inside models.yolo.Detect and what purpose or function for this models.

@yutrif the Detect() layer is responsible for turning the YOLO feature vectors into detection outputs. You can find it in models/yolo/detect.py as you said:

Thanks for the information but I want to know more about the SPPF, how could SPPF produce less flops ?

what did you modify from SPP so it is become like the new SPPF ?

@yutrif SPPF runs convolutions is series, concatenating results rather than running larger kernel convolutions in parallel.

Thanks for your information, I have a new question about DetectMultiBackend class and Detection class. do you have any paper or flow diagram how DetectMultiBackend class and Detection class works?

@yutrif DetectMultiBackend() class allows YOLOv5 models to be initialized from any of our supported export formats and then supports inference using those formats with the forward() method.

Formats

YOLOv5 inference is officially supported in 11 formats:

Format export.py --include Model
PyTorch - yolov5s.pt
TorchScript torchscript yolov5s.torchscript
ONNX onnx yolov5s.onnx
OpenVINO openvino yolov5s_openvino_model/
TensorRT engine yolov5s.engine
CoreML coreml yolov5s.mlmodel
TensorFlow SavedModel saved_model yolov5s_saved_model/
TensorFlow GraphDef pb yolov5s.pb
TensorFlow Lite tflite yolov5s.tflite
TensorFlow Edge TPU edgetpu yolov5s_edgetpu.tflite
TensorFlow.js tfjs yolov5s_web_model/

You can see exported model Usage examples following an export with export.py:

1 Like

How About Detect class or Detect.py ? do you have any flow diagram or any flow to explain it?

@yutrif Detect() class simply converts features to detections. Classification models use Classify() class, which turns features into classifications. BTW, excellent architecture visualization posted today at YOLOv5 6.0 Model Structure Β· Issue #6885 Β· ultralytics/yolov5 Β· GitHub by WZMIAOMIAO

2 Likes