the inference process of SAM has 2 main stages, (1) Generated image embedding, (2) Generated mask based on prompt.
(1) takes up most of the compute time, so it is a common practice to run (1) once and (2) multiple times, for interactive GUIs
does the ultralytics package support this use case
1 Like
Toxite
July 26, 2025, 5:30pm
2
You can use set_image
>>> predictor.setup_source(None) # Uses default source if available
Notes:
- If source is None, the method may use a default source if configured.
- The method adapts to different source types and prepares them for subsequent inference steps.
- Supported source types may include local files, directories, URLs, and video streams.
"""
if source is not None:
super().setup_source(source)
def set_image(self, image):
"""
Preprocess and set a single image for inference.
This method prepares the model for inference on a single image by setting up the model if not already
initialized, configuring the data source, and preprocessing the image for feature extraction. It
ensures that only one image is set at a time and extracts image features for subsequent use.
Args:
image (str | np.ndarray): Path to the image file as a string, or a numpy array representing
an image read by cv2.
model.predictor.set_image("image.jpg")
embeddings = model.predictor.features
Check this too:
opened 07:55AM - 23 Jan 25 UTC
closed 01:05AM - 14 Feb 25 UTC
enhancement
question
segment
### Search before asking
- [x] I have searched the Ultralytics YOLO [issues](ht… tps://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/orgs/ultralytics/discussions) and found no similar questions.
### Question
# Optimizing SAM Inference for High-Resolution Image Segmentation with Hundreds of Prompts
## **Description**
I'm currently working on segmenting objects within high-resolution images using SAM 2.1 and MobileSAM. My workflow involves the following steps:
1. **Bounding Box Detection:**
- I utilize another model to perform bounding box (bbox) detection.
- A single image typically contains **hundreds of objects**, resulting in **hundreds of bbox annotations**.
2. **Segmentation with SAM:**
- Mainly using bbox prompts, I employ SAM to segment each detected object.
- **Challenge:** While SAM's inference time is relatively fast, I followed the method described in [Ultralytics SAM Documentation](https://docs.ultralytics.com/ko/models/sam-2/) (results = model("path/to/image.jpg", bboxes=[100, 100, 200, 200])). As a result, there are **hundreds of high-resolution image inputs** and **repeated image encodings**, which cause **significant time delays**.
## **Problem Statement**
My primary goal is to efficiently segment hundreds of objects in a single high-resolution image. Specifically:
- **Multiple Prompts Handling:**
- For a single image, I need to input **hundreds of bbox or point prompts for different objects**.
- **Performance Bottleneck:**
- Current approaches cause significant time lags due to hundreds of high-resolution image inputs and iterative image encoding, although there is only one type of input image.
- **Segmentation Accuracy:**
- If you use whole image segmentation without specific prompts, objects are not accurately distinguished, so you must use bbox prompts.
## **Desired Outcome**
To optimize the segmentation process, I aim to:
1. **Batch Processing of Prompts:**
- If that's possibleI, i want input all (or a large batch of) bbox or point prompts **simultaneously** to minimize processing time.
2. **Single Image Encoding:**
- **image encoding step is performed only once**, even if prompts are processed sequentially.
## **Request for Assistance**
I'm seeking guidance or potential solutions to address the following:
- **Efficient Prompt Handling:**
- How can I input hundreds of prompts for a single high-resolution image without incurring significant processing delays?
- **Optimizing Image Encoding:**
- Is there a way to reuse image encoding from multiple prompt inputs, or to pre-encode and enter it when needed?
## **Additional Information**
- **Tools & Versions:**
- SAM 2.1 / MobileSAM
- Ultralytics' SAM implementation
- **Image Specifications:**
- Ultra-high resolution (>10MB per image)
- Hundreds of objects per image
I would like to know if an official feature for this is provided, or if you have any reference materials, insights, or suggestions for similar implementations, I would greatly appreciate it. Thank you for your support!
2 Likes