Standard Naming Conventions for Backbone, Neck, and Head in Object Detection Architectures?

In the architecture diagram I am reproducing, I have temporarily denoted the Backbone blocks as B, the Neck components as T, and the inputs to the Head as U. I realize this notation is likely non-standard. Could you recommend the standard naming conventions typically used in the literature? Please help me!

You’re right that B, T, U are not standard; the good news is that there isn’t a strict universal standard either, just a few common conventions that readers will immediately recognize.

Most papers and libraries simply name the three big blocks explicitly as “Backbone”, “Neck”, and “Detection Head”, exactly as described in the Ultralytics glossary on object detection architectures, with more detail in the entries for the backbone and the detection head.

For per-layer or per-stage notation, the most common patterns in the literature are:

  • ResNet/FPN-style detectors (Faster R‑CNN, RetinaNet, many follow-ups) use:

    • C2, C3, C4, C5 for backbone stages (conv2_x … conv5_x outputs).
    • P2, P3, P4, P5, P6 for FPN neck outputs (the multi‑scale features sent to the head).
  • YOLO-style one‑stage detectors often just label:

    • Backbone stages as things like Stage 1/2/3 or C1/C2/C3.
    • Neck outputs (head inputs) as P3, P4, P5 or by stride: s=8, 16, 32.

So for your diagram, a “standard-looking” choice would be:

  • Replace B with either “Backbone” plus stage names like C3, C4, C5 or “Stage 3/4/5”.
  • Replace T with “Neck (FPN/PAN)” and label those nodes as P3, P4, P5.
  • Replace U with “Head inputs P3/P4/P5” or simply “Multi‑scale features to Detection Head”.

As long as you use the words Backbone / Neck / Detection Head and something like C‑levels for backbone and P‑levels for pyramid features, readers familiar with modern detection papers will feel at home.

2 Likes

I am very grateful for your detailed answer. This information will be highly instrumental in helping me complete my essay well, while also ensuring I understand the general naming conventions for YOLO architectures.