Imgsize during inference

Hi folks,
I trained my model by setting imgsz=640 and during inference also setting imgsz=640. If my input is just one image, it is maintaining the original aspect ratio. For instance, if the orginal image size is (1381, 912) the mask size becomes (640,448) and this is completely what I expected. But if the source in prediction function takes multiple images, all the mask sizes become (640,640) not the original aspect ratio. Why is this happening if I feed multiple images?

The reason I asked about the above question is that I am getting two different masks even after I resized the masks back to original sizes as follows
resized_mask = cv2.resize(mask, (orig_width, orig_height), interpolation=cv2.INTER_NEAREST)
resized_mask_uint8 = (resized_mask * 255).astype(np.uint8)
Appreciate any response

During inference, if there’s only one image, then it’s resized on the largest dimension only. When there are multiple images of various sizes/ratios, then the images are resized and padded to be square. This is performed by the pre_transform() method of the BasePredictor class

Thanks for your answer, I am just wondering if there is any way to give multiple images that have been resized by original aspect ratio. I am getting wrong masks and wrong segmentation as a result. Or would you consider adding an option to prediction method that we can choose if images should be squared or resized by image ratio

The aspect ratio of the images is preserved, which is why it uses padding (via the LetterBox class).

If you’re getting incorrect predictions, please share your inference code and examples of images with correct and incorrect predictions, otherwise it’s difficult to help troubleshoot your issue.

If singular images are not giving you any issues, instead of passing the images as a list, you could instead pass them iteratively in a loop. It might be a bit slower, but without any additional information it’s the only other option I can propose.

It’s not possible to pass the model images with differing image aspect ratios at the same time, so this can’t be made as an option how you’re describing. The way it currently functions is the intended design, where images that have different aspect ratios are resized and padded to be compatible with the model, and has not been observed to cause incorrect prediction results.

Here is my inference
Python script:

def find_digits_seg(cropped_met_imgs):
digits_results = digits_seg_model.predict(source=cropped_met_imgs, imgsz=640, save=True, show_conf=True, conf=0.5)

for i, (cropped_meter, seg) in enumerate(zip(cropped_met_imgs, digits_results)):
    if seg.masks is not None:
        mask = seg.masks.data[0].cpu().numpy()
        mask = np.squeeze(mask)
        
        orig_width = seg.orig_shape[1]
        orig_height = seg.orig_shape[0]
        
        resized_mask = cv2.resize(mask, (orig_width, orig_height), interpolation=cv2.INTER_NEAREST)
        resized_mask_uint8 = (resized_mask * 255).astype(np.uint8)

        segmented_object = cv2.bitwise_and(cropped_meter, 
        cropped_meter, mask=resized_mask_uint8)
        display_img(segmented_object)

The first below image is the original water meter image before being passed into the inference. Second one is when passed to inference individually. Third one is when passed along with multiple images.
Second one’s mask size is (640 by 448) but third one’s mask size is 640 by 640

You can pass retina_masks=True to get masks that are resized to the original size of the image, so you wouldn’t need to resize manually.

1 Like

In addition to what Toxite mentioned, you could check out the Isolating Segmentation Objects - Ultralytics YOLO Docs guide for another method for masking segmented objects.