Output of the model in training vs. inference

robin_rob96 · August 7, 2024, 10:35am

Hi,
I just read the yolo paper from 2015 in which it states that the predictions are encoded in a tensor of shape SxSx(5*B + C)

So that means that for every cell in the grid there are 5 params per box + probabilities for each class.

This is the output shape for training, right?
I would assume that in inference (for task=detection) mode the output for every grid cell should be B * (5 + C). So that the class probabilities are tied to each box in B.

Or how are the c classes in C connected to b boxes in B?
Thanks

BurhanQ · August 7, 2024, 1:34pm

@robin_rob96 the YOLOv8 model (going to refer to the COCO pretrained model, but custom models will be slightly different) has an output shape of [N, 84, 8400] where N is the batch size for inference (prior to non max suppression (NMS). The 84 represent the four bounding box attrbiutes + 80 class confidence scores, and the 8400 represent all predictions made by the model.

Each row in a Tensor representing a single image (or if batch=1), indexed [0, 8399] will be [box] + [class-confidence] with a shape of [1, 84] or [84,].

The paper you’re referring to is for the original YOLO model structure, which is different from today’s structure used with Ultralytics YOLO. For a custom model trained on 7 classes, with batch=1 then the model output before NMS would be [1, 11, 8400] composed of the four bounding box attributes plus the seven class confidence scores.

During training the output for a YOLOv8 model will be the same as it is for inference. The major difference being that the predictions are used to calculate the loss and update the model, which will not occur during inference.

Topic		Replies	Views
Yolo11 onnx export format and parsing output Discussion yolo , question , support , code	4	480	February 21, 2025
YOLO 12 Id'ing images not on my list Support yolo , question , support	15	150	April 1, 2025
Changing bounding boxes to polygons Discussion question	2	115	December 7, 2024
YoloV10 bounding box format YOLO	2	298	October 11, 2024
The need for specific bbox formatting Support question , code	3	24	July 3, 2025

Output of the model in training vs. inference

Related topics