Hi There! I’m a current beginner and wanted to ask a general question regarding the YOLOv8 model (and v11 as well) to see if it was possible to override the loss function (More specifically, BBoxLoss) when training on my dataset while still maintaining the other pretrained weights. If not, would I have to implement my model from scratch?
in most cases means you shouldn’t be attempting this, but that really depends on your goal (which wasn’t stated).
If you’re looking to learn where to make changes to the loss for learning purposes, you can check here in the source code for the v8DetectionLoss class. If however your goal is to improve your model’s performance when training, it is very likely that you should not attempt to change something related to the loss as a first or early step, as there are several other more important things to attempt as a priority.
My apologies. I realized my last post was extremely vague. Let me provide some clarification:
I am currently working in a project where we are detecting small defects (cracks, missing bolts). The subset of the dataset I’m training on is ~3,000 images. My previous task was to look to modify the loss function (more specifically, BBoxLoss) in order for it to focus more on smaller objects. Please let me know what steps I should take first / what suggestions you have to improve the model’s performance, as well as any general recommendations (such as what model I should use). Please also let me know if this post is still vague.
The additional context is helpful. A few more details could help, and not that you have to reply to them, but they are likely going to be useful to consider:
Are you using a detection model or segmentation model?
I would suggest using a segmentation model for cracks especially.
What are the pixel dimensions of the images?
What’s considered “small” for your object size? Do you mean something like 20 x 20 pixels for an image that’s 1600 x 1200 or do you mean 4 x 6 pixels for an image that’s 1200 x 900?
What imgsz are you training at?
What imgsz are you using for prediction?
Have you explored tried image slicing?
Have you adjusted any of the image augmentation hyperparameters?
What’s the current performance of the model?
Anywhere in particular it’s not meeting performance expectations (low mAP, high FP, misclassification, etc.)
What is the performance target, or how do you know when you’ve achieved the model performance you need?
If your end goal is a better model, there’s probably many ways to improve model performance before attempting to make custom modifications the model loss. Something else to consider is that the quality and consistency of your ground-truth annotations will be critical in how well the model is able to detect and classify correctly, so I would recommend verifying these as a primary task. After that, you could work thru some of the ideas in the bullets above, as those are the best places to start before attempting to use a custom loss (unless you have some sort of evidence showing that’s what’s needed, but if you have such justification, it’s likely you’d know how to modify the loss already).
Since IoU calculates the overlap with your ground truth (GT) bounding box annotations, as long as your GT boxes are precisely annotated, then height is inherently included. Think of it this way. If the GT box is incorrect in height, but correct in width, then the predicted boxes will likely have the same errors. That’s because the model is attempting to provide the best overlap with predicted boxes to match the GT boxes. In your case, that doesn’t necessary mean you can/should ignore how precise the GT box width is, since the entire box area is used to help determine the boundaries of your object(s).
In truth, if the precision of the object boundaries are the most important, you can use a segmentation model. The GT annotation might be a bit more tedious, but using SAM or SAM2 can help a lot. Using segmentation means that you’re defining the boundary of the object from all other objects, and will give you a more precise overall size; you can also easily derive a bounding box from the segmentation contours as well (YOLO segmentation models will automatically output both segmentation contours AND bounding boxes).
Thank you for clarifying. So the idea is if the box loss finds minimum that for sure will be the minimum for the height loss. Will try segmentation then.
I have a similar question. I have just started with YoLo (trying out v8-seg and v11-seg right now). My use case involves segmenting skin issues, acne, lesions etc. The current results are not satisfying. What do you suggest?
these are the validation logs from the training of yolov11x-seg
Class Images Instances Mask(P R mAP50 mAP50-95):
all 630 12698 0.337 0.225 0.22 0.0732
0 629 9567 0.434 0.432 0.399 0.133
1 385 3131 0.24 0.0172 0.0404 0.0137
You’re on the right track using YOLO11-seg for this kind of task – skin lesions are exactly the sort of problem where segmentation helps over pure detection.
From your metrics it looks like class 0 is learning reasonably (mAP50 ≈ 0.40) while class 1 is almost not being detected at all (very low recall). That usually points to data/label issues or class imbalance rather than a loss-function problem.
Before touching the loss, I’d suggest checking a few basics:
Visually inspect predictions for class 1
Open a batch of validation images and see what YOLO11x-seg is doing for that class: are masks completely missing, badly shaped, or just low-confidence? That will tell you if the model “sees” them at all.
Check dataset balance and labels
Make sure class 1 has enough well-annotated examples and that its masks are clean and consistent. For medical data, small inconsistencies in how annotators draw masks can easily kill performance.
Use higher resolution or tiling if lesions are small
If your original images are large and lesions are small, try a larger imgsz (for example 1024 or 1280 if your GPU allows), or crop images into patches so each lesion occupies more pixels. YOLO’s segmentation head benefits a lot from that in medical imagery.
An example training command to try with YOLO11-seg might be:
If after cleaning labels, balancing classes, and increasing effective resolution class 1 is still not learned, then we can talk about more advanced things like class re-weighting or custom loss tweaks. In that case, please share:
The exact yolo train command you used (model, imgsz, epochs, etc.)
A couple of example images showing typical class 0 and class 1 cases with their masks
That context will make it much easier to suggest concrete next steps.
I have different datasets for this and yes class imbalance is always going to be an issue here. I have a better data set that I am going to use from here on, and on higher resolution images (for the run above, imgsz was 640).
Also, I think dice coefficient will be a better judgement metric compared to mAP and other traditional ones in this case.
I was training with image size (imgsz) 640
98% of the time (possibly even more than that), the answer to improving model performance is the same:
Collect and annotate more data for training and validation
Ensure that your annotations are correct (don’t miss anything), accurate (don’t include pixels that are not relevant for a given class label), and consistent (ensure you are labeling classes the same, this can be an issue with more abstract class types)
You can try lots of other things, but nothing will improve your model’s performance more/faster than doing these two things. You haven’t shared how much data you have or labeled instances, but it’s highly likely that this is what needs to be done.
Currently I’m working on a detection model in which there are 2 classes tanks and trucks the tank class has 38.165K instances across 11.5k images and tank has 6.802k instances across 4.5k images and I’m training it with Yolov8m so there is a huge class imbalance so how do i fix this?Do i need to change the distribution focal loss or change the class loss?what necessary steps do i need to take to make it right?