I tried to manually calculate the validation loss of my model, using the ‘probs’ and then taking the logarithmic function (CE loss). When I compared this I had different results, but when applying softmax it resulted the right answer. I was wondering if this softmax is not redundant as I’m using the probabilities (and not the logits)? So is the probs before applying softmax? Or how can this be the case?
I tried the same for the training loss and I achieved similar results when I did not add the extra softmax.
Great question—let’s clarify a few things about how the loss function works in YOLO models like YOLOv8, especially for classification tasks. The key here lies in understanding the role of logits, softmax, and probabilities in the loss calculation.
Validating CE Loss Calculation
Logits vs. Probabilities:
Typically, models output logits, which are unnormalized scores. These logits are then passed through a softmax function during loss computation to convert them into probabilities.
If you’re working with raw logits, you’ll need to apply the softmax before calculating Cross-Entropy (CE) Loss to ensure you’re working with valid probabilities.
Softmax in YOLOv8 Training:
When you mentioned using probs, it’s essential to confirm whether these are already softmax-adjusted probabilities. If you skipped this step, you’ll likely see the discrepancy in your manual CE loss calculation.
Why Softmax Is Not Redundant:
Softmax helps normalize logits into a probability distribution that sums to 1. Even if you’re using probs, double-check if they are pre-softmax or post-softmax. It seems YOLO outputs logits, so softmax isn’t redundant—it’s an integral part of the CE Loss computation.
Why Training vs. Validation Behavior Differs
For training, the softmax step is typically part of the framework’s internal implementation (e.g., PyTorch). That’s why you might not need to add an extra softmax manually when replicating training losses. Ensure alignment in how “probs” are defined for both cases during your manual validation.
Additional Tips
Feel free to review Ultralytics YOLOv8 classification documentation for more insights: YOLOv8 Classification Docs. If you’d like, you can also look into the model implementation by downloading the source code to verify how the logits are processed during training and validation.
Hopefully, this clears things up! Let me know if you have more questions
Hey, thank you for the quick and helpful reply! This already clarified a lot. However, I used the training data as input (to find the losses the other way round), similarly I used the validation data and test data as input to evaluate the models behaviour. Originally I wanted to calculate the training accuracy, thus I used the training data as input to the model itself, to see which were labelled right and wrong. So, I used the probs the same way for the validation and train set and found that using the softmax is not necessary for the training loss but is necessary for the validation loss, hence my confusion. Below some numbers for clarification; keeping in mind that I extracted the probs exactly the same way for the two cases…
train loss (by yolov8): 0.05
train loss (probs): 1.5
train loss (softmax(probs): 0.04
val loss (by yolov8): 1.4
train loss (probs): 0.05
train loss (softmax(probs): 1.4