Ultralytics v8.3.204 — smoother multi‑GPU, stabler exports, broader device support 
Quick summary: this release sharpens training ergonomics, stabilizes ONNX/TensorFlow exports, broadens device coverage (MPS, Rockchip), trims checkpoints for lean inference, and refreshes docs with a YOLO26 preview and updated YOLO11 metrics. It’s a quality-of-life update that makes multi‑GPU projects, ONNX workflows, and edge deployments easier than ever. ![]()
To upgrade, install the latest package with pip install -U ultralytics.
Highlights
Trainer batch handling moved into the Trainer for clearer multi‑GPU behavior and errors.
YOLOE segmentation training streamlined with a dedicated from‑scratch trainer and cleaner loss logic.
Export refinements: CUDA‑aware ONNX opset choice, explicit NMS guards, and correct image‑size propagation.
Device compatibility: safer MPS transfers and working RKNN/OpenVINO in Streamlit live inference.
Leaner checkpoints: AMP scaler state is stripped for smaller, faster inference weights.
Docs refresh: YOLO26 preview and updated YOLO11 FLOPs/params, plus YAML guide fixes.
YOLO11 remains the latest stable and recommended model for all use cases, and you can explore the preview of YOLO26 on the new docs page. For segmentation and open‑vocabulary experiments, the upgraded YOLOE workflows are now cleaner and more torch.compile‑friendly.
New Features
- YOLOE segmentation training is improved with a dedicated
YOLOESegTrainerFromScratch, streamlined preprocessing, and corrected loss initialization for simpler from‑scratch and fine‑tune paths. - ONNX export now picks CUDA‑aware opsets and propagates
imgszcorrectly, reducing runtime mismatches.
Improvements
- Multi‑GPU training now requires an explicit
batchin the Trainer, preventing silent defaults and helping you balance GPU workloads. - Safer device transfers on Apple MPS avoid output corruption.
- Streamlit live inference now supports RKNN/OpenVINO more reliably for Rockchip deployments.
- Checkpoints have the AMP scaler stripped for lighter inference assets.
- CI and docs quality improvements, including a wider PyTorch test matrix and updated YOLO11 profiling.
Bug Fixes
- Fixed CUDA slow tests in CI and
--slowhandling in pipelines. - Resolved
AttributeErrorwhen validating withvisualize=True. - Addressed RKNN issues in Streamlit live inference.
- Ensured
imgszis correctly used for YOLOE visual prompt extraction.
What’s Changed
- We fixed CI performance issues in the change described in Fix --slow CUDA tests by @glenn-jocher.
- Classification training is safer thanks to a directory validation added in Verify passed dataset is a directory and not a file for classification training by @Y-T-G.
- The test matrix now covers more PyTorch versions via Expand PyTorch version text matrix by @glenn-jocher.
- Validation stability improves with a fix in Fix AttributeError error for validation with visualize by @Y-T-G.
- CI consistency is improved in Fix --slow flag in ci.yml pip install command by @Laughing-q.
- Packaging constraints are simplified by Remove uv version 0.8.19 restraint from @Laughing-q.
- MPS reliability increases through Disable non_blocking device transfers for MPS to prevent output corruption by @Y-T-G.
- CI debugging is easier after Enable Python Fault Handler in CI by @Y-T-G.
- torch.compile friendliness improves with Remove the compromise of criterion initialization for torch.compile by @Laughing-q.
- YOLOE preprocessing is cleaner thanks to YOLOE: Cleanup duplicate preprocess_batch by @Laughing-q.
- Checkpoints are leaner after Strip scaler from model checkpoints by @Y-T-G.
- You can preview future models in the docs through Docs: Add new page for YOLO26 by @Laughing-q.
- Edge deployments benefit from Add RKNN model support in streamlit live-inference solution by @RizwanMunawar.
- Visual prompts align with your configuration after YOLOE: Use passed imgsz for visual prompt extraction by @Y-T-G.
- RKNN stability is improved in Fix RKNN model support in streamlit live-inference solution by @lakshanthad.
- YOLO11 metrics are up to date after Docs: Update YOLO11 profiling data in README by @lmycross.
- Documentation authorship is accurate thanks to Add Mengyu to mkdocs_github_authors.yaml by @glenn-jocher.
- The YAML guide is clearer following Fix Docs YAML guide missing docstrings by @glenn-jocher.
- Multi‑GPU behavior is clearer due to ultralytics 8.3.204 Scope batch_size check from select_device() to Trainer by @Laughing-q.
You can browse the release details on the v8.3.204 release page, and you can compare versions in the full changelog view.
New Contributors
- Thanks to @lmycross for their first contribution in Docs: Update YOLO11 profiling data in README. We appreciate the community’s continued support!
Try it now
- Recommended model: get started with YOLO11 docs and preview what’s next on the YOLO26 page.
- YOLOE users can explore the improved workflow in the YOLOE model guide.
Minimal examples:
# Upgrade
pip install -U ultralytics
# Multi-GPU training now requires an explicit batch size
yolo train model=yolo11n.pt data=coco128.yaml imgsz=640 batch=64 device=0,1
# ONNX export with automatic CUDA-aware opset and imgsz propagation
yolo export model=yolo11n.pt format=onnx imgsz=640
Python usage:
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.train(data="coco128.yaml", imgsz=640, batch=64, device="0,1") # explicit batch for multi-GPU
model.export(format="onnx", imgsz=640) # cuda-aware opset + imgsz propagation
We’d love your feedback. Please try v8.3.204 in your projects and share issues or suggestions in Discussions and Issues so we can keep polishing the experience for everyone. Thank you to the YOLO community and the Ultralytics team for the continuous contributions and testing!