In the architecture diagram I am reproducing, I have temporarily denoted the Backbone blocks as B, the Neck components as T, and the inputs to the Head as U. I realize this notation is likely non-standard. Could you recommend the standard naming conventions typically used in the literature? Please help me!
You’re right that B, T, U are not standard; the good news is that there isn’t a strict universal standard either, just a few common conventions that readers will immediately recognize.
Most papers and libraries simply name the three big blocks explicitly as “Backbone”, “Neck”, and “Detection Head”, exactly as described in the Ultralytics glossary on object detection architectures, with more detail in the entries for the backbone and the detection head.
For per-layer or per-stage notation, the most common patterns in the literature are:
-
ResNet/FPN-style detectors (Faster R‑CNN, RetinaNet, many follow-ups) use:
C2, C3, C4, C5for backbone stages (conv2_x … conv5_x outputs).P2, P3, P4, P5, P6for FPN neck outputs (the multi‑scale features sent to the head).
-
YOLO-style one‑stage detectors often just label:
- Backbone stages as things like
Stage 1/2/3orC1/C2/C3. - Neck outputs (head inputs) as
P3, P4, P5or by stride:s=8, 16, 32.
- Backbone stages as things like
So for your diagram, a “standard-looking” choice would be:
- Replace
Bwith either “Backbone” plus stage names likeC3, C4, C5or “Stage 3/4/5”. - Replace
Twith “Neck (FPN/PAN)” and label those nodes asP3, P4, P5. - Replace
Uwith “Head inputs P3/P4/P5” or simply “Multi‑scale features to Detection Head”.
As long as you use the words Backbone / Neck / Detection Head and something like C‑levels for backbone and P‑levels for pyramid features, readers familiar with modern detection papers will feel at home.
I am very grateful for your detailed answer. This information will be highly instrumental in helping me complete my essay well, while also ensuring I understand the general naming conventions for YOLO architectures.
