Yolov8 architecture modifying for multi-label classification

Sarvin · November 18, 2025, 6:24pm

Hey Ultralytics community!

I’m working on a Multi-label image Classification project using the Yolov8 classification model. To the best of my knowledge, the built-in Yolov8 classifier appears to be for single-label tasks and requires some modifications to its architecture. The key modifications needed are:

Changing the final activation function in the head from softmax to sigmoid.
Changing the loss function from Cross-Entropy to Binary Cross-Entropy (BCE).
Setting the number of output units in the final layer to match my total number of classes.

My main questions are:

What is the best way to implement these architectural changes? Should I create a custom .yaml file for the model, and if so, which parameters should I modify?
Do I need to override the default train function and implement a custom training loop to use BCE loss? If yes, could you point me to an example or outline the steps?
Are there any existing examples or best practices within the community for adapting YOLOv8 for multi-label tasks?

I am open to any suggestions and support. Thanks in advance for your help!

pderrenger · November 19, 2025, 1:29pm

You’re thinking about this in exactly the right way: for multi‑label classification you want independent probabilities per class (Sigmoid) and a BCE-style loss, rather than a Softmax + CrossEntropy setup as described in the Ultralytics glossaries on Sigmoid and Softmax.

A few key points specific to YOLOv8 classification:

Final activation / architecture
In the Ultralytics classifiers, the last layer is typically just a linear layer that outputs logits; the “Softmax” is applied conceptually at loss/metrics time rather than as an explicit module in the model. For multi‑label you actually do not want to add an explicit Sigmoid layer either; you normally keep raw logits and use BCEWithLogitsLoss, which internally combines Sigmoid + BCE. So architecturally you only need to ensure the last linear layer outputs nc units. The simplest way is to copy the yolov8*-cls.yaml, set nc to your number of labels, and train from that.
Loss: CrossEntropy → BCE
Out of the box, the YOLOv8 classification trainer assumes single‑label targets and uses a CrossEntropy loss. There is no current YAML flag to switch just the loss to BCE for multi‑label, so you have two realistic options:

Fork the repo and modify the classification trainer (the file that defines ClassificationTrainer) to:
- replace CrossEntropy with torch.nn.BCEWithLogitsLoss
- change the dataloader/labels to provide multi‑hot targets of shape [B, C] with 0/1 per class
Or, treat the YOLOv8 classifier as a regular PyTorch backbone and write a small custom training loop that:
- loads the model (from ultralytics import YOLO → model = YOLO('yolov8n-cls.yaml').model)
- ensures the final linear layer has out_features = num_labels
- uses BCEWithLogitsLoss and a custom dataset that returns multi‑label targets

Examples / best practices
There isn’t an official Ultralytics example for multi‑label classification with YOLOv8 today; people usually either:

fork and tweak the classification trainer as above, or
use YOLO as a feature extractor in a plain PyTorch multi‑label pipeline.

If you’re starting fresh, the same strategy applies to the newer Ultralytics YOLO11 classification models as well; they share the same principles around logits, Sigmoid/BCE, and multi‑label heads.

If you share how you’re currently preparing labels (single index vs multi‑hot), I can outline a minimal training loop tailored to that format.

Sarvin · November 19, 2025, 7:41pm

Dear @pderrenger

Thank you so much for your incredibly detailed and helpful response. To answer your question, I will share your some details about my data format and labels:

I have moved away from the directory-based structure. I am now using a CSV file where each row contains an image_path and a list of multi-hot encoded labels. For example, for a dataset with 4 classes, a row looks like: ["path/to/img.jpg", [0, 1, 1, 0]], indicating that classes 2 and 3 are present in the image.

I would be immensely grateful if you could provide a sketch of the custom training loop tailored to this data format.

Thank you again for your invaluable time and expertise.

BurhanQ · November 21, 2025, 12:30pm

FWIW, you don’t technically need to modify anything with the model if you’re using yolo11-cls because in the results object, you can access top5 and top5conf for the probability of the top-5 classification results. See the docs here:

Topic		Replies	Views
YOLOV8 classification problem Discussion discussion	4	289	November 28, 2025
New Release: Ultralytics v8.3.47 Discussion releases , announcements , ultralytics-official	0	74	December 7, 2024
Add New Classes to (YOLOv8n or YOLO11n) Pretrained Model Without Losing COCO Classes Discussion yolo , question , support	2	1732	April 20, 2025
Modifying yolo architecture YOLO yolo , question , support , discussion , code	4	399	November 5, 2025
Extending YOLO26 for custom multi-task architecture Discussion question	1	269	February 12, 2026

Yolov8 architecture modifying for multi-label classification

Related topics