I’m investigating some inference discrepancies between a YOLO11m-cls YOLO model and its exported ONNX equivalent. In the course of debugging, I added a few lines to the BasePredictor
inference code around line 260 (ultralytics/ultralytics/engine/predictor.py at main · ultralytics/ultralytics · GitHub ):
import matplotlib.pyplot as plt
import time
now = time.strftime('%H-%M-%S')
imnp = np.uint8(255*im[0].cpu().numpy().transpose((1, 2, 0)))
plt.imsave(f'some/directory/check{now}.png', imnp)
And this confirmed that when i pass a non-square image for inference with code like:
yolo_model = YOLO(yolo_checkpoint)
yolo_res = yolo_model(test_image_file, imgsz=input_shape, verbose=False)
The image gets cropped, not padded.
From reading the ultralytics documentation, I was under the impression that the intended behavior is for images to be square-padded when an imgsz
argument is passed. Have I misunderstood something? Is there a built-in way to enforce square padding during classifier inference?
Thanks!
Oh yeah, and after discovering this, I went back to inspect some output from custom classifier training more closely, and the sample ‘train_batch0.jpg ’-type files showed the same was happening during training. Probably to the detriment of this particular training job. So a way to enforce square padding during training would be greatly appreciated as well.
Toxite
March 1, 2025, 4:26am
3
This does not work during inference. I’m looking at modifying the classify_transforms
function (ultralytics/ultralytics/data/augment.py at 3fceec57be02140c5d6c5779b0310d4496c3aa90 · ultralytics/ultralytics · GitHub ) in a similar manner.
Beyond the immediate fix, it seems like this is something that should be very clear in the documentation – right now if you search how imgsz
works, all doc references just say images are padded and resized. If classifier transforms work differently, it should be noted here: Classify - Ultralytics YOLO Docs
Toxite
March 1, 2025, 6:00am
5
You could manually preprocess the image and pass the tensor instead. It would skip preprocessing and use your preprocessed tensor.
There’s a PR for non-square inference with classification models.
main
← Y-T-G:classify_args_metadata_fix
opened 11:57AM - 02 Oct 24 UTC
Closes https://github.com/ultralytics/ultralytics/issues/16026
Closes https://g… ithub.com/ultralytics/ultralytics/issues/18788
**Classification task currently:**
1. Doesn't allow `imgsz` override for `.pt` models because `transforms` stored in the model are used over `self.args.imgsz`
https://github.com/ultralytics/ultralytics/blob/db3c0400c5d59376a5ba5fe1c6c02ecdff8329e3/ultralytics/engine/predictor.py#L189-L196
```python
In [3]: model = YOLO("yolo11n-cls.pt")
In [4]: result = model("ultralytics/assets/bus.jpg", imgsz=640)
image 1/1 /ultralytics/ultralytics/assets/bus.jpg: 224x224 minibus 0.57, police_van 0.34, trolleybus 0.04, recreational_vehicle 0.01, streetcar 0.01, 22.4ms
Speed: 8.3ms preprocess, 22.4ms inference, 0.1ms postprocess per image at shape (1, 3, 224, 224)
```
2. Doesn't use the `imgsz` from the metadata.
```python
In [5]: model = YOLO("yolo11n-cls.onnx")
In [6]: result = model("ultralytics/assets/bus.jpg")
Loading yolo11n-cls.onnx for ONNX Runtime inference...
---------------------------------------------------------------------------
InvalidArgument Traceback (most recent call last)
...
...
...
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: images for the following indices
index: 2 Got: 640 Expected: 224
index: 3 Got: 640 Expected: 224
Please fix either the inputs/outputs or the model.
```
3. Doesn't support non-square `imgsz`
https://github.com/ultralytics/ultralytics/blob/db3c0400c5d59376a5ba5fe1c6c02ecdff8329e3/ultralytics/engine/predictor.py#L193
even though `classify_transforms` has handling for it:
https://github.com/ultralytics/ultralytics/blob/db3c0400c5d59376a5ba5fe1c6c02ecdff8329e3/ultralytics/data/augment.py#L2379-L2385
**After the changes:**
1. `imgsz` override and non-square sizes:
```python
In [1]: from ultralytics import YOLO
In [2]: model = YOLO("yolo11n-cls.pt")
In [3]: result = model("ultralytics/assets/bus.jpg", imgsz=(224, 256))
image 1/1 /ultralytics/ultralytics/assets/bus.jpg: 224x256 police_van 0.57, minibus 0.35, minivan 0.02, limousine 0.01, amphibian 0.01, 4.2ms
Speed: 9.7ms preprocess, 4.2ms inference, 0.1ms postprocess per image at shape (1, 3, 224, 256
```
2. `imgsz` from metadata. No need to explicitly specify `imgsz`:
```python
In [4]: model = YOLO("yolo11n-cls.onnx")
In [5]: result = model("ultralytics/assets/bus.jpg")
Loading yolo11n-cls.onnx for ONNX Runtime inference...
image 1/1 /ultralytics/ultralytics/assets/bus.jpg: 224x224 minibus 0.57, police_van 0.34, trolleybus 0.04, recreational_vehicle 0.01, streetcar 0.01, 3.4ms
Speed: 7.1ms preprocess, 3.4ms inference, 0.1ms postprocess per image at shape (1, 3, 224, 224)
```
## 🛠️ PR Summary
<sub>Made with ❤️ by [Ultralytics Actions](https://github.com/ultralytics/actions)<sub>
### 🌟 Summary
Enhanced flexibility in image size and transformations for prediction.
### 📊 Key Changes
- Modified how image sizes (`imgsz`) are retrieved, prioritizing model-defined sizes.
- Updated transformation handling for classification tasks, simplifying logic.
### 🎯 Purpose & Impact
- 🛠 Provides greater flexibility and reliability by using model-specific image sizes when available.
- 🔄 Streamlines transformation setup, potentially improving performance during classification tasks.