When the prototype masks are added up, are they cropped by the corresponding bounding box before being applied to the input image? I know they are cropped during training. But the boxes when I did cv2.rectangle was smaller than the masked area.
Hello Andrew_Qian,
That’s a great question. Yes, during the segmentation inference process, the generated masks are indeed cropped by their corresponding bounding boxes.
This is handled by the process_mask
function, which takes the mask prototypes and bounding boxes as input. It then calls the crop_mask
function to effectively zero out the mask area that falls outside of the predicted bounding box. This step occurs before the final mask is upsampled to the original image size.
You can review the implementation of both process_mask
and crop_mask
in the ops.py
utility reference.
Hope this helps clarify the process
Thanks for the response. My next question is, how come the box (xyxy) when drawn on the image does not match the mask?
I noticed the mask outputted is relative to the padded version of the image I passed in when I use the python API. So I unpadded the mask. This resulted in the correct mask in my image albeit one that reaches outside the region slightly, but the box when unpadded (and when padded) encapsulates the object, just like a normal detection would, but is smaller than the mask.
How are you plotting the boxes/masks? If you are using the Results
object method result.plot()
the boxes and make should align properly. See the documentation on plotting results:
Upon further investigation this is what I discovered:
Results.xyxy gives you the correct unpadded box coordinates, results.plot is always correct. But the mask output (.masks.data) is not unpadded, so it needs correction to keep the mask in only the unpadded regions.
results[0].boxes.xyxy: unpadded and in correct original coordinates
results[0].masks.data: not unpadded and relative to input coordinates.
results[0].plot(): plotting is correct
But the plot function is not so useful for custom multi stage pipelines that use the output of one model for the input of another.
If you use retina_masks=True
in model.predict()
, result.masks.data
should also be correctly resized to match the original image.