Imgsize during inference

Javokhir_Abdirashido · November 13, 2024, 11:00am

Hi folks,
I trained my model by setting imgsz=640 and during inference also setting imgsz=640. If my input is just one image, it is maintaining the original aspect ratio. For instance, if the orginal image size is (1381, 912) the mask size becomes (640,448) and this is completely what I expected. But if the source in prediction function takes multiple images, all the mask sizes become (640,640) not the original aspect ratio. Why is this happening if I feed multiple images?

The reason I asked about the above question is that I am getting two different masks even after I resized the masks back to original sizes as follows
resized_mask = cv2.resize(mask, (orig_width, orig_height), interpolation=cv2.INTER_NEAREST)
resized_mask_uint8 = (resized_mask * 255).astype(np.uint8)
Appreciate any response

BurhanQ · November 13, 2024, 1:46pm

During inference, if there’s only one image, then it’s resized on the largest dimension only. When there are multiple images of various sizes/ratios, then the images are resized and padded to be square. This is performed by the pre_transform() method of the BasePredictor class

github.com

ultralytics/ultralytics/blob/7d55aa0422a84631dc3f981c4ad0586c9b4c480b/ultralytics/engine/predictor.py#L145-L157


      
          def pre_transform(self, im):
              """
              Pre-transform input image before inference.
          
              Args:
                  im (List(np.ndarray)): (N, 3, h, w) for tensor, [(h, w, 3) x N] for list.
          
              Returns:
                  (list): A list of transformed images.
              """
              same_shapes = len({x.shape for x in im}) == 1
              letterbox = LetterBox(self.imgsz, auto=same_shapes and self.model.pt, stride=self.model.stride)
              return [letterbox(image=x) for x in im]

Javokhir_Abdirashido · November 13, 2024, 1:56pm

Thanks for your answer, I am just wondering if there is any way to give multiple images that have been resized by original aspect ratio. I am getting wrong masks and wrong segmentation as a result. Or would you consider adding an option to prediction method that we can choose if images should be squared or resized by image ratio

BurhanQ · November 13, 2024, 2:25pm

The aspect ratio of the images is preserved, which is why it uses padding (via the LetterBox class).

If you’re getting incorrect predictions, please share your inference code and examples of images with correct and incorrect predictions, otherwise it’s difficult to help troubleshoot your issue.

If singular images are not giving you any issues, instead of passing the images as a list, you could instead pass them iteratively in a loop. It might be a bit slower, but without any additional information it’s the only other option I can propose.

It’s not possible to pass the model images with differing image aspect ratios at the same time, so this can’t be made as an option how you’re describing. The way it currently functions is the intended design, where images that have different aspect ratios are resized and padded to be compatible with the model, and has not been observed to cause incorrect prediction results.

Javokhir_Abdirashido · November 14, 2024, 6:42am

Here is my inference
Python script:

def find_digits_seg(cropped_met_imgs):
digits_results = digits_seg_model.predict(source=cropped_met_imgs, imgsz=640, save=True, show_conf=True, conf=0.5)

for i, (cropped_meter, seg) in enumerate(zip(cropped_met_imgs, digits_results)):
    if seg.masks is not None:
        mask = seg.masks.data[0].cpu().numpy()
        mask = np.squeeze(mask)
        
        orig_width = seg.orig_shape[1]
        orig_height = seg.orig_shape[0]
        
        resized_mask = cv2.resize(mask, (orig_width, orig_height), interpolation=cv2.INTER_NEAREST)
        resized_mask_uint8 = (resized_mask * 255).astype(np.uint8)

        segmented_object = cv2.bitwise_and(cropped_meter, 
        cropped_meter, mask=resized_mask_uint8)
        display_img(segmented_object)

The first below image is the original water meter image before being passed into the inference. Second one is when passed to inference individually. Third one is when passed along with multiple images.
Second one’s mask size is (640 by 448) but third one’s mask size is 640 by 640

Toxite · November 15, 2024, 9:58am

You can pass retina_masks=True to get masks that are resized to the original size of the image, so you wouldn’t need to resize manually.

BurhanQ · November 15, 2024, 1:42pm

In addition to what Toxite mentioned, you could check out the Isolating Segmentation Objects - Ultralytics YOLO Docs guide for another method for masking segmented objects.

Topic		Replies	Views
YOLO11 classifier not padding during resize? YOLO support	4	164	March 1, 2025
How Yolo11-cls resize preprocess works? Discussion yolo , question	5	70	May 8, 2025
New Release: Ultralytics v8.2.84 Discussion releases , announcements , ultralytics-official	0	20	August 30, 2024
Need Help with 1:1 Image Resizing During Training on YOLOv8 Discussion yolov8 , code	3	1427	August 27, 2024
New Release: Ultralytics v8.3.131 Discussion releases , announcements , ultralytics-official	0	18	May 11, 2025

Imgsize during inference

Related topics