Confusion matrix contains many classes not in dataset

John_Baker · October 4, 2025, 3:06pm

When I run model.val() before training the model with my dataset the confusion matrix contains several classes that are not specified in the dataset’s YAML file. This issue #16695 explains why the background class is present, but why are all these other classes (person, bicycle, car etc) present?

Here’s my code (which I’m running in Google Colab):

!pip install ultralytics
!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key=[api_key_goes_here])
project = rf.workspace("conveyor-550m0").project("conveyor-hhrzw")
version = project.version(3)
dataset = version.download("yolov11")

from ultralytics import YOLO
model = YOLO("yolo11n.pt")

results = model.val(data="/content/conveyor-3/data.yaml")
print(results.confusion_matrix.to_df()) # First confusion matrix
train_results = model.train(data="/content/conveyor-3/data.yaml", epochs=1)
results = model.val()
print(results.confusion_matrix.to_df()) # Second confusion matrix

The YAML file (/content/conveyor-3/data.yaml) contains three classes:

nc: 3
names: ['cardboard box', 'conveyor', 'kartonbox']

The first confusion matrix (before training) looks like this:

┌────────────┬────────┬─────────┬─────┬───┬────────────┬────────────┬────────────┬────────────┐
│ Predicted  ┆ person ┆ bicycle ┆ car ┆ … ┆ teddy_bear ┆ hair_drier ┆ toothbrush ┆ background │
│ ---        ┆ ---    ┆ ---     ┆ --- ┆   ┆ ---        ┆ ---        ┆ ---        ┆ ---        │
│ str        ┆ f64    ┆ f64     ┆ f64 ┆   ┆ f64        ┆ f64        ┆ f64        ┆ f64        │
╞════════════╪════════╪═════════╪═════╪═══╪════════════╪════════════╪════════════╪════════════╡
│ person     ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 5.0        │
│ bicycle    ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ car        ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ motorcycle ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ airplane   ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ …          ┆ …      ┆ …       ┆ …   ┆ … ┆ …          ┆ …          ┆ …          ┆ …          │
│ scissors   ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ teddy_bear ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ hair_drier ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ toothbrush ┆ 0.0    ┆ 0.0     ┆ 0.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
│ background ┆ 152.0  ┆ 34.0    ┆ 5.0 ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
└────────────┴────────┴─────────┴─────┴───┴────────────┴────────────┴────────────┴────────────┘

The second confusion matrix (after training) looks like this:

┌───────────────┬───────────────┬──────────┬───────────┬────────────┐
│ Predicted     ┆ cardboard_box ┆ conveyor ┆ kartonbox ┆ background │
│ ---           ┆ ---           ┆ ---      ┆ ---       ┆ ---        │
│ str           ┆ f64           ┆ f64      ┆ f64       ┆ f64        │
╞═══════════════╪═══════════════╪══════════╪═══════════╪════════════╡
│ cardboard_box ┆ 0.0           ┆ 0.0      ┆ 0.0       ┆ 0.0        │
│ conveyor      ┆ 0.0           ┆ 0.0      ┆ 0.0       ┆ 0.0        │
│ kartonbox     ┆ 0.0           ┆ 0.0      ┆ 0.0       ┆ 0.0        │
│ background    ┆ 167.0         ┆ 46.0     ┆ 5.0       ┆ 0.0        │
└───────────────┴───────────────┴──────────┴───────────┴────────────┘

As expected, the second confusion matrix contains the three classes from the dataset’s YAML file and also the background class. I expected that the first confusion matrix would also only contain those four classes. Even before training, I expected that model validation would use the classes listed in the dataset’s YAML file.

Where does model.val() get all those other classes (person, bicycle, car etc) from?

Toxite · October 5, 2025, 1:22am

You can’t use a model trained on different set of classes to validate on your own dataset. The model will always produce the classes it was trained on.

It also doesn’t make sense to run model.val() after model.train() because model.train() already returns the same thing as model.val() at the end of training, unless you’re training on multiple GPUs.

John_Baker · October 5, 2025, 8:38am

Does the ultralytics Python package provide a way to evaluate the out-of-the-box performance of a pretrained YOLO model on a given dataset? I want to compare performance before training and after training to see how much improvement in performance we get from fine-tuning the pretrained model on our dataset.

Toxite · October 5, 2025, 1:17pm

This is not possible for a closed set YOLO model trained on different set of classes. A model can only predict classes that it was trained on. The accuracy on any other class will be 0. It only learns to predict your class after training.

The pretrained model was trained on COCO dataset. It can only predict the classes in that dataset.

Topic		Replies	Views
Generating text-based confusion matrix? Support question , support , troubleshooting	4	600	May 5, 2025
Validation with YOLOv8 segmentation YOLO	8	222	April 28, 2025
YOLO 12 Id'ing images not on my list Support yolo , question , support	15	373	April 1, 2025
The ultralytics code doesn't do the validation Support question , discussion	3	434	October 8, 2024
Very bad validation metrics in custom dataset with fine tuned model YOLO yolov10 , yolo , question , support , code	3	252	March 31, 2025

Confusion matrix contains many classes not in dataset

Related topics