Ambigueous promt with SAM, return top 3 Masks

Is there a way to get ULTRALYTICS SAM or SAM2 to provide the top 3 masks when prompted by a single point, the same way it happens on the original paper

the code appears to have this capability, but I can figure out how to use it, ultralytics/ultralytics/models/sam/modules/sam.py at a05186329989c2ea46c5c13b1b55ca37cbeed17b · ultralytics/ultralytics · GitHub

how to update this simple code:

from ultralytics import SAM

model = SAM("sam2.1_t.pt")

# This returns a list of lenght==0
results = model.predict("path/to/image.jpg", points=[900, 370], labels=[1])

You can try passing multimask_output=True to model.predict()

does not work

from ultralytics import SAM

model = SAM("sam2.1_t.pt")

# This returns a list of lenght==0
results = model.predict("debug/images/cars_people_720p.jpg", points=[200, 370], labels=[1])

# this does not work
results = model.predict("debug/images/cars_people_720p.jpg", points=[200, 370], labels=[1], multimask_output=True)
$ py test_sam.py
...
SyntaxError: 'multimask_output' is not a valid YOLO argument. 

    Arguments received: ['yolo']. Ultralytics 'yolo' commands use the following syntax:

        yolo TASK MODE ARGS

        Where   TASK (optional) is one of ['obb', 'pose', 'detect', 'classify', 'segment']
                MODE (required) is one of ['train', 'val', 'benchmark', 'export', 'predict', 'track']
                ARGS (optional) are any number of custom 'arg=value' pairs like 'imgsz=320' that override defaults.
                    See all ARGS at https://docs.ultralytics.com/usage/cfg or with 'yolo cfg'

    1. Train a detection model for 10 epochs with an initial learning_rate of 0.01
        yolo train data=coco8.yaml model=yolo11n.pt epochs=10 lr0=0.01

    2. Predict a YouTube video using a pretrained segmentation model at image size 320:
        yolo predict model=yolo11n-seg.pt source='https://youtu.be/LNwODJXcvt4' imgsz=320

    3. Val a pretrained detection model at batch-size 1 and image size 640:
        yolo val model=yolo11n.pt data=coco8.yaml batch=1 imgsz=640

    4. Export a YOLO11n classification model to ONNX format at image size 224 by 128 (no TASK required)
        yolo export model=yolo11n-cls.pt format=onnx imgsz=224,128

    5. Ultralytics solutions usage
        yolo solutions count or in ['crop', 'blur', 'workout', 'heatmap', 'isegment', 'visioneye', 'speed', 'queue', 'analytics', 'inference', 'trackzone'] source="path/to/video.mp4"

    6. Run special commands:
        yolo help
        yolo checks
        yolo version
        yolo settings
        yolo copy-cfg
        yolo cfg
        yolo solutions help

    Docs: https://docs.ultralytics.com
    Solutions: https://docs.ultralytics.com/solutions/
    Community: https://community.ultralytics.com
    GitHub: https://github.com/ultralytics/ultralytics

You can try this:

from ultralytics import SAM
model = SAM("sam2.1_t.pt")
model()  # creates predictor. run once after model load

results = model.predictor("ultralytics/assets/bus.jpg", points=[200, 370], labels=[1], multimask_output=True)

I get the same error

SyntaxError: 'multimask_output' is not a valid YOLO argument. 

    Arguments received: ['yolo']. Ultralytics 'yolo' commands use the following syntax:

        yolo TASK MODE ARGS

        Where   TASK (optional) is one of ['segment', 'detect', 'classify', 'pose', 'obb']
                MODE (required) is one of ['predict', 'train', 'export', 'benchmark', 'track', 'val']
                ARGS (optional) are any number of custom 'arg=value' pairs like 'imgsz=320' that override defaults.
                    See all ARGS at https://docs.ultralytics.com/usage/cfg or with 'yolo cfg'

    1. Train a detection model for 10 epochs with an initial learning_rate of 0.01
        yolo train data=coco8.yaml model=yolo11n.pt epochs=10 lr0=0.01

    2. Predict a YouTube video using a pretrained segmentation model at image size 320:
        yolo predict model=yolo11n-seg.pt source='https://youtu.be/LNwODJXcvt4' imgsz=320

    3. Val a pretrained detection model at batch-size 1 and image size 640:
        yolo val model=yolo11n.pt data=coco8.yaml batch=1 imgsz=640

    4. Export a YOLO11n classification model to ONNX format at image size 224 by 128 (no TASK required)
        yolo export model=yolo11n-cls.pt format=onnx imgsz=224,128

    5. Ultralytics solutions usage
        yolo solutions count or in ['crop', 'blur', 'workout', 'heatmap', 'isegment', 'visioneye', 'speed', 'queue', 'analytics', 'inference', 'trackzone'] source="path/to/video.mp4"

    6. Run special commands:
        yolo help
        yolo checks
        yolo version
        yolo settings
        yolo copy-cfg
        yolo cfg
        yolo solutions help

    Docs: https://docs.ultralytics.com
    Solutions: https://docs.ultralytics.com/solutions/
    Community: https://community.ultralytics.com
    GitHub: https://github.com/ultralytics/ultralytics

It works for me:

In [1]: from ultralytics import SAM
   ...: model = SAM("sam2.1_t.pt")
   ...: model()  # creates predictor. run once after model load
   ...:
   ...: results = model.predictor("ultralytics/assets/bus.jpg", po
   ...: ints=[200, 370], labels=[1], multimask_output=True)
WARNING ⚠️ 'source' is missing. Using 'source=/ultralytics/ultralytics/assets'.

image 1/2 /ultralytics/ultralytics/assets/bus.jpg: 1024x1024 1 0, 1 1, 1 2, 1 3, 1 4, 1 5, 1 6, 1 7, 1 8, 1 9, 1 10, 1 11, 1 12, 5689.2ms
image 2/2 /ultralytics/ultralytics/assets/zidane.jpg: 1024x1024 1 0, 1 1, 1 2, 2378.3ms
Speed: 29.1ms preprocess, 4033.7ms inference, 1.9ms postprocess per image at shape (1, 3, 1024, 1024)

image 1/1 /ultralytics/ultralytics/assets/bus.jpg: 1024x1024 1 0, 1 1, 1 2, 98.8ms
Speed: 9.3ms preprocess, 98.8ms inference, 1.1ms postprocess per image at shape (1, 3, 1024, 1024)

In [2]: results[0].masks.shape
Out[2]: torch.Size([3, 1080, 810])

testing it again it appears to work, i did not notice that you need to use predictor instead of predict

btw is there a less junky way of initialling the model.predictor

That’s the shortest way