What is the ideal imgsz for Document classification in yolo11

JonitJohn · May 11, 2025, 3:47pm

Given my YOLO11m-cls document classification task with a modest 1.5K image dataset spread across just four categories, what’s the “sweet spot” imgsz that will allow my model to truly see the crucial document features without drowning in unnecessary pixel detail or starving for information? I’m aiming for that perfect balance of accuracy and training efficiency.
Imagine a pipeline where classified documents hand off to an OCR engine. Could the imgsz I meticulously chose for YOLOv8m-cls training be a secret performance booster – or a hidden bottleneck – for the downstream OCR process? I’m curious if my classification image resolution will either empower or inadvertently handicap the text recognition stage.

BurhanQ · May 12, 2025, 11:37am

This is something you’ll need to text for yourself. There are too many variables for anyone to be able to provide you accurate advice.

Most users start with the default 640 x 640 image size for training. If you have images with very small details, then you can try incrementing the value of imgz progressively, however for document classification it’s likely the small details won’t make a large difference in the result.

In general the answer to the question, “for my use case ‘X’ what value should I use for ‘Y’?” will have the same answer; you have to test. It takes time to do so, but it will provide a definitive answer instead of a speculative one, and help you better understand how the model in impacted by your specific use case.

JonitJohn · May 18, 2025, 6:05am

Thank you BurhanQ

I’ve trained a classification model with four distinct classes (class_1, class_2, class_3, class_4) that performs well on its training and validation data. During testing, I’m implementing a threshold of 85% confidence: any prediction below this is categorized as ‘Others’.

My client has also provided a separate ‘Others’ test dataset, which ideally should consist of samples that fall below this 85% confidence threshold. However, when I test my model on this ‘Others’ dataset, I’m observing that the model is still making predictions with high confidence for one of the original four classes, rather than consistently falling below the threshold.

This leads to a few key questions:

Why is my model attempting to classify samples from the ‘Others’ dataset with high confidence into one of the defined classes, even though these samples are meant to be inherently different and fall below my confidence threshold?
Is there something fundamental about the model’s training that’s causing this behavior?
Should I have included an ‘Others’ class during training? I initially decided against this because the definition of ‘Others’ from the client is inconsistent and lacks specific characteristics, making it difficult to define a coherent training class. However, could the absence of an ‘Others’ class during training be forcing the model to try and fit everything into the existing categories?

[ I cannot provide more details on data it’s classified ]

BurhanQ · May 19, 2025, 2:20pm

When a model produces high confidence incorrect predictions, it usually is an indicator that there is an insufficient number of samples for the model to generalize on new data. Documents of the same type, for instance invoices, purchase orders, bill of materials, etc. are generally similar in structure. If this is the type of classification you’re aiming to accomplish, it could require significantly more data; and there’s only one way to find out how much.

JonitJohn · May 20, 2025, 4:20pm

Thank you @BurhanQ

So I made a few changes to fix the high-confidence wrong predictions issue.

First, I added a new class called “NotClassified” and set up a simple rule:

If the model predicts “NotClassified” OR the confidence is below 85%, it just returns “Others” instead.
This helps avoid those cases where the model is super sure but totally wrong.

Before, I was only doing augmentation on the 2 minority classes to balance things out.
Now, I’m running all classes through my Albumentations pipeline, which seems to work better.
After tweaking, the model’s confidence is now in the 80-95% range, which is what I wanted.

Here’s the augmentation pipeline I’m using—it’s pretty general but works well for most classification tasks (though for specific cases, you might need to adjust it)

augmentations = A.Compose([
A.HorizontalFlip(p=0.5),
A.Rotate(limit=5, p=0.3),
A.RandomBrightnessContrast(p=0.4),
A.GaussianBlur(blur_limit=3, p=0.3),
A.ShiftScaleRotate(shift_limit=0.03, scale_limit=0.05, rotate_limit=5, p=0.3),
A.HueSaturationValue(p=0.2),
A.RandomResizedCrop(size=(256, 256), scale=(0.85, 1.0), ratio=(0.9, 1.1), p=0.5),
A.CoarseDropout(max_holes=6, max_height=24, max_width=24, fill_value=0, p=0.3), .
A.ISONoise(p=0.2),
A.MotionBlur(blur_limit=3, p=0.2),
A.Resize(256, 256),
ToTensorV2()
])
It’s a solid baseline, but for really specific tasks like super strict doc classification, you might need to tweak it further.

Topic		Replies	Views
How Yolo11-cls resize preprocess works? Discussion yolo , question	5	126	May 8, 2025
25/5000 yolov8 classification model, why can't it recognize anything after training Discussion question , support	3	81	March 28, 2025
YOLO11 classifier not padding during resize? YOLO support	4	238	March 1, 2025
YOLOv11m Model Missing Objects at High Confidence Thresholds Compared to YOLOv7 – Need Help Optimizing Detection YOLO question , support , yolo11	2	491	October 28, 2024
I am trying to make Yolov8 model 1622 class object detection YOLO support	8	239	September 3, 2024

What is the ideal imgsz for Document classification in yolo11

Related topics