How Yolo11-cls resize preprocess works?

Eyal_Brilling · May 6, 2025, 11:30am

Hello,
I have images of size 1980x1080. I put them into yolo11-cls with imgsz = 1024.
I was wondering how the resize to 1024 happens?
From what i read on the internet the imgs get resized to 1024x768 to keep the image ratio, and then the image get padded on the right and left to get 1024x1024 images.

The problem is that when i run inference, On resizing of the image for inference to 1024x1024 i get better results then if i resize to 1024x768 and then add padding to 1024x1024.
I was expecting doing the same process as the YOLO will give the better results, so i wonder if my understanding of the resize process is right?

BurhanQ · May 6, 2025, 12:08pm

The code for image resizing specifically for classification is here:

github.com/ultralytics/ultralytics

ultralytics/data/augment.py

149dd26af


      
          # Aspect ratio is preserved, crops center within image, no borders are added, image is lost
          if scale_size[0] == scale_size[1]:
              # Simple case, use torchvision built-in Resize with the shortest edge mode (scalar size arg)
              tfl = [T.Resize(scale_size[0], interpolation=getattr(T.InterpolationMode, interpolation))]
          else:
              # Resize the shortest edge to matching target dim for non-square target
              tfl = [T.Resize(scale_size)]
          tfl += [T.CenterCrop(size), T.ToTensor(), T.Normalize(mean=torch.tensor(mean), std=torch.tensor(std))]
          return T.Compose(tfl)

which is using the torchvision.Resize operation.

Eyal_Brilling · May 6, 2025, 12:29pm

Thank you for the fast answer!
Where scale_size comes from? is it the value we put in imgsz when calling predict on yolo?
if that the case, isn’t yolo classification always forces that scale_size[0] == scalie_size[1]?
does that means the preprocess step is only resizing??
thanks!

BurhanQ · May 6, 2025, 12:54pm

I believe it’s derived from imgsz but it’s calculated just above those lines:

github.com/ultralytics/ultralytics

ultralytics/data/augment.py

149dd26af


      
          Returns:
              (torchvision.transforms.Compose): A composition of torchvision transforms.
          
          Examples:
              >>> transforms = classify_transforms(size=224)
              >>> img = Image.open("path/to/image.jpg")
              >>> transformed_img = transforms(img)
          """
          import torchvision.transforms as T  # scope for faster 'import ultralytics'
          
          scale_size = size if isinstance(size, (tuple, list)) and len(size) == 2 else (size, size)
          
          if crop_fraction:
              raise DeprecationWarning(
                  "'crop_fraction' arg of classify_transforms is deprecated, will be removed in a future version."
              )
          
          # Aspect ratio is preserved, crops center within image, no borders are added, image is lost
          if scale_size[0] == scale_size[1]:
              # Simple case, use torchvision built-in Resize with the shortest edge mode (scalar size arg)
              tfl = [T.Resize(scale_size[0], interpolation=getattr(T.InterpolationMode, interpolation))]

where size is an argument to the classify_transforms function, and I presume would use the imgsz argument when called.

It does resize, but if you look at L2598, T.CenterCrop(size) is included as part of the transformations, so there appears to be some center cropping that occurs, which when images are not square with equal dimensions could cutoff parts of the image. I’m not aware why this is used instead of padding (which is used for object detection).

Bishop · May 7, 2025, 5:31am

I looked into this while testing yolo11-cls on my own dataset. It seems like the resize step maintains aspect ratio by default and adds padding if needed—kind of like letterboxing. I had to tweak it a bit because my input images were all square and didn’t need padding.

I’m using 224x224 as input size with grayscale images, and it handled them fine.
Check the albumentations section in the data config—resize logic is applied there before batching.
You can override the default transforms in the val or predict functions if needed.

pderrenger · May 8, 2025, 12:39am

Hello Eyal,

Thanks for your detailed question and for looking into the preprocessing steps.

For YOLO11-cls during prediction/validation with a single integer imgsz (e.g., imgsz=1024), the image preprocessing is typically handled by the classify_transforms function. This involves two main steps:

The image is first resized such that its shortest edge becomes equal to imgsz (1024 in your case), while maintaining the aspect ratio. For your 1980x1080 image, the 1080 dimension (height) would be scaled to 1024, and the width (1980) would be scaled proportionally to approximately 1877. So the image becomes roughly 1877x1024.
A center crop of (imgsz, imgsz) (i.e., 1024x1024) is then taken from this resized image.

This process differs from the letterboxing approach (resizing to fit within 1024x1024 and then padding) you described, which explains why you’re observing different results. The scale_size is indeed derived from the imgsz you provide, and if imgsz is an integer, scale_size will effectively be (imgsz, imgsz). The preprocessing isn’t just resizing; it’s a resize followed by a center crop to achieve the final square input.

You can find more details about this in the documentation for classify_transforms in the Ultralytics Data Augmentation Reference.

Topic		Replies	Views
YOLO11 classifier not padding during resize? YOLO support	4	171	March 1, 2025
Need Help with 1:1 Image Resizing During Training on YOLOv8 Discussion yolov8 , code	3	1448	August 27, 2024
Imgsize during inference Support question	6	383	November 15, 2024
YOLOv11 pose YOLO pose , support , code	2	129	December 4, 2024
Ultralytics YOLO11 Released 🎉 Discussion showcase , ultralytics-official	1	327	October 1, 2024

How Yolo11-cls resize preprocess works?

Related topics