Given this simple benchmark of inference speed
import os
import time
import psutil
from PIL import Image
from ultralytics import YOLO
detector = YOLO("yolo12m.pt", "cpu")
image_path = "debug/images/face_4096.jpg"
image = Image.open(image_path).convert("RGB")
process = psutil.Process(os.getpid())
while True:
start_time = time.time()
results = detector.predict(image)
end_time = time.time()
print(f"Detection took: {(end_time - start_time) * 1000:.4f} milliseconds")
i get a sample output of
...
0: 640x640 1 person, 2 chairs, 52.4ms
Speed: 3.4ms preprocess, 52.4ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 226.1436 milliseconds
0: 640x640 1 person, 2 chairs, 52.4ms
Speed: 3.3ms preprocess, 52.4ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 227.5164 milliseconds
0: 640x640 1 person, 2 chairs, 52.5ms
Speed: 3.7ms preprocess, 52.5ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 226.4342 milliseconds
0: 640x640 1 person, 2 chairs, 52.5ms
Speed: 3.4ms preprocess, 52.5ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 231.9701 milliseconds
0: 640x640 1 person, 2 chairs, 52.4ms
Speed: 3.7ms preprocess, 52.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 235.8549 milliseconds
0: 640x640 1 person, 2 chairs, 52.4ms
Speed: 3.8ms preprocess, 52.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 234.9603 milliseconds
0: 640x640 1 person, 2 chairs, 52.5ms
Speed: 3.6ms preprocess, 52.5ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 232.1908 milliseconds
0: 640x640 1 person, 2 chairs, 52.4ms
Speed: 3.8ms preprocess, 52.4ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)
Detection took: 237.7448 milliseconds
...
here we can see that the total execution time (~225-230ms) is much longer than the total process + inference speed time (~56-57ms)
Note: I have tested with stream=True
argument, and I get the same behavior