Loss function in classification yolov8

Aster_Tournoy · November 27, 2024, 10:54am

I tried to manually calculate the validation loss of my model, using the ‘probs’ and then taking the logarithmic function (CE loss). When I compared this I had different results, but when applying softmax it resulted the right answer. I was wondering if this softmax is not redundant as I’m using the probabilities (and not the logits)? So is the probs before applying softmax? Or how can this be the case?

I tried the same for the training loss and I achieved similar results when I did not add the extra softmax.

pderrenger · November 27, 2024, 1:27pm

Hello!

Great question—let’s clarify a few things about how the loss function works in YOLO models like YOLOv8, especially for classification tasks. The key here lies in understanding the role of logits, softmax, and probabilities in the loss calculation.

Validating CE Loss Calculation

Logits vs. Probabilities:
- Typically, models output logits, which are unnormalized scores. These logits are then passed through a softmax function during loss computation to convert them into probabilities.
- If you’re working with raw logits, you’ll need to apply the softmax before calculating Cross-Entropy (CE) Loss to ensure you’re working with valid probabilities.
Softmax in YOLOv8 Training:
- When you mentioned using probs, it’s essential to confirm whether these are already softmax-adjusted probabilities. If you skipped this step, you’ll likely see the discrepancy in your manual CE loss calculation.
Why Softmax Is Not Redundant:
- Softmax helps normalize logits into a probability distribution that sums to 1. Even if you’re using probs, double-check if they are pre-softmax or post-softmax. It seems YOLO outputs logits, so softmax isn’t redundant—it’s an integral part of the CE Loss computation.

Why Training vs. Validation Behavior Differs

For training, the softmax step is typically part of the framework’s internal implementation (e.g., PyTorch). That’s why you might not need to add an extra softmax manually when replicating training losses. Ensure alignment in how “probs” are defined for both cases during your manual validation.

Additional Tips

Feel free to review Ultralytics YOLOv8 classification documentation for more insights: YOLOv8 Classification Docs. If you’d like, you can also look into the model implementation by downloading the source code to verify how the logits are processed during training and validation.

Hopefully, this clears things up! Let me know if you have more questions

Aster_Tournoy · November 27, 2024, 2:24pm

Hey, thank you for the quick and helpful reply! This already clarified a lot. However, I used the training data as input (to find the losses the other way round), similarly I used the validation data and test data as input to evaluate the models behaviour. Originally I wanted to calculate the training accuracy, thus I used the training data as input to the model itself, to see which were labelled right and wrong. So, I used the probs the same way for the validation and train set and found that using the softmax is not necessary for the training loss but is necessary for the validation loss, hence my confusion. Below some numbers for clarification; keeping in mind that I extracted the probs exactly the same way for the two cases…
train loss (by yolov8): 0.05
train loss (probs): 1.5
train loss (softmax(probs): 0.04

val loss (by yolov8): 1.4
train loss (probs): 0.05
train loss (softmax(probs): 1.4

Toxite · November 27, 2024, 8:07pm

It’s because when not in training mode, the output is softmaxed.

github.com

ultralytics/ultralytics/blob/386a3b7625b3263989650d69eec6ed3b1ce067cd/ultralytics/nn/modules/head.py#L299


      
                  self.conv = Conv(c1, c_, k, s, p, g)
                  self.pool = nn.AdaptiveAvgPool2d(1)  # to x(b,c_,1,1)
                  self.drop = nn.Dropout(p=0.0, inplace=True)
                  self.linear = nn.Linear(c_, c2)  # to x(b,c2)
          
              def forward(self, x):
                  """Performs a forward pass of the YOLO model on input image data."""
                  if isinstance(x, list):
                      x = torch.cat(x, 1)
                  x = self.linear(self.drop(self.pool(self.conv(x)).flatten(1)))
                  return x if self.training else x.softmax(1)
          
          
          class WorldDetect(Detect):
              """Head for integrating YOLO detection models with semantic understanding from text embeddings."""
          
              def __init__(self, nc=80, embed=512, with_bn=False, ch=()):
                  """Initialize YOLO detection layer with nc classes and layer channels ch."""
                  super().__init__(nc, ch)
                  c3 = max(ch[0], min(self.nc, 100))
                  self.cv3 = nn.ModuleList(nn.Sequential(Conv(x, c3, 3), Conv(c3, c3, 3), nn.Conv2d(c3, embed, 1)) for x in ch)

So training loss is not softmaxed, while validation loss is.

Topic		Replies	Views
Changing the Loss Function in pretrained models YOLO question , support	3	703	January 30, 2025
Does Yolov8 includes input normalization pipeline? Discussion yolo , question , support , discussion , code	3	85	January 10, 2025
25/5000 yolov8 classification model, why can't it recognize anything after training Discussion question , support	3	67	March 28, 2025
Validation with YOLOv8 segmentation YOLO	8	58	April 28, 2025
Output of the model in training vs. inference YOLO question , curious	1	353	August 7, 2024

Loss function in classification yolov8

Validating CE Loss Calculation

Why Training vs. Validation Behavior Differs

Additional Tips

Related topics