Given my YOLO11m-cls document classification task with a modest 1.5K image dataset spread across just four categories, what’s the “sweet spot” imgsz
that will allow my model to truly see the crucial document features without drowning in unnecessary pixel detail or starving for information? I’m aiming for that perfect balance of accuracy and training efficiency.
Imagine a pipeline where classified documents hand off to an OCR engine. Could the imgsz
I meticulously chose for YOLOv8m-cls training be a secret performance booster – or a hidden bottleneck – for the downstream OCR process? I’m curious if my classification image resolution will either empower or inadvertently handicap the text recognition stage.
This is something you’ll need to text for yourself. There are too many variables for anyone to be able to provide you accurate advice.
Most users start with the default 640 x 640 image size for training. If you have images with very small details, then you can try incrementing the value of imgz
progressively, however for document classification it’s likely the small details won’t make a large difference in the result.
In general the answer to the question, “for my use case ‘X’ what value should I use for ‘Y’?” will have the same answer; you have to test. It takes time to do so, but it will provide a definitive answer instead of a speculative one, and help you better understand how the model in impacted by your specific use case.
1 Like