What is the ideal imgsz for Document classification in yolo11

Given my YOLO11m-cls document classification task with a modest 1.5K image dataset spread across just four categories, what’s the “sweet spot” imgsz that will allow my model to truly see the crucial document features without drowning in unnecessary pixel detail or starving for information? I’m aiming for that perfect balance of accuracy and training efficiency.
Imagine a pipeline where classified documents hand off to an OCR engine. Could the imgsz I meticulously chose for YOLOv8m-cls training be a secret performance booster – or a hidden bottleneck – for the downstream OCR process? I’m curious if my classification image resolution will either empower or inadvertently handicap the text recognition stage.

This is something you’ll need to text for yourself. There are too many variables for anyone to be able to provide you accurate advice.

Most users start with the default 640 x 640 image size for training. If you have images with very small details, then you can try incrementing the value of imgz progressively, however for document classification it’s likely the small details won’t make a large difference in the result.

In general the answer to the question, “for my use case ‘X’ what value should I use for ‘Y’?” will have the same answer; you have to test. It takes time to do so, but it will provide a definitive answer instead of a speculative one, and help you better understand how the model in impacted by your specific use case.

1 Like

Thank you BurhanQ

I’ve trained a classification model with four distinct classes (class_1, class_2, class_3, class_4) that performs well on its training and validation data. During testing, I’m implementing a threshold of 85% confidence: any prediction below this is categorized as ‘Others’.

My client has also provided a separate ‘Others’ test dataset, which ideally should consist of samples that fall below this 85% confidence threshold. However, when I test my model on this ‘Others’ dataset, I’m observing that the model is still making predictions with high confidence for one of the original four classes, rather than consistently falling below the threshold.

This leads to a few key questions:

  1. Why is my model attempting to classify samples from the ‘Others’ dataset with high confidence into one of the defined classes, even though these samples are meant to be inherently different and fall below my confidence threshold?
    Is there something fundamental about the model’s training that’s causing this behavior?
  2. Should I have included an ‘Others’ class during training? I initially decided against this because the definition of ‘Others’ from the client is inconsistent and lacks specific characteristics, making it difficult to define a coherent training class. However, could the absence of an ‘Others’ class during training be forcing the model to try and fit everything into the existing categories?

[ I cannot provide more details on data it’s classified ]

When a model produces high confidence incorrect predictions, it usually is an indicator that there is an insufficient number of samples for the model to generalize on new data. Documents of the same type, for instance invoices, purchase orders, bill of materials, etc. are generally similar in structure. If this is the type of classification you’re aiming to accomplish, it could require significantly more data; and there’s only one way to find out how much.

1 Like

Thank you @BurhanQ

So I made a few changes to fix the high-confidence wrong predictions issue.

First, I added a new class called “NotClassified” and set up a simple rule:

If the model predicts “NotClassified” OR the confidence is below 85%, it just returns “Others” instead.
This helps avoid those cases where the model is super sure but totally wrong.

Before, I was only doing augmentation on the 2 minority classes to balance things out.
Now, I’m running all classes through my Albumentations pipeline, which seems to work better.
After tweaking, the model’s confidence is now in the 80-95% range, which is what I wanted.

Here’s the augmentation pipeline I’m using—it’s pretty general but works well for most classification tasks (though for specific cases, you might need to adjust it)

augmentations = A.Compose([
A.HorizontalFlip(p=0.5),
A.Rotate(limit=5, p=0.3),
A.RandomBrightnessContrast(p=0.4),
A.GaussianBlur(blur_limit=3, p=0.3),
A.ShiftScaleRotate(shift_limit=0.03, scale_limit=0.05, rotate_limit=5, p=0.3),
A.HueSaturationValue(p=0.2),
A.RandomResizedCrop(size=(256, 256), scale=(0.85, 1.0), ratio=(0.9, 1.1), p=0.5),
A.CoarseDropout(max_holes=6, max_height=24, max_width=24, fill_value=0, p=0.3), .
A.ISONoise(p=0.2),
A.MotionBlur(blur_limit=3, p=0.2),
A.Resize(256, 256),
ToTensorV2()
])
It’s a solid baseline, but for really specific tasks like super strict doc classification, you might need to tweak it further.

1 Like