New Release: Ultralytics v8.3.204

Ultralytics v8.3.204 — smoother multi‑GPU, stabler exports, broader device support :rocket:

Quick summary: this release sharpens training ergonomics, stabilizes ONNX/TensorFlow exports, broadens device coverage (MPS, Rockchip), trims checkpoints for lean inference, and refreshes docs with a YOLO26 preview and updated YOLO11 metrics. It’s a quality-of-life update that makes multi‑GPU projects, ONNX workflows, and edge deployments easier than ever. :glowing_star:

To upgrade, install the latest package with pip install -U ultralytics.

Highlights

  • :counterclockwise_arrows_button: Trainer batch handling moved into the Trainer for clearer multi‑GPU behavior and errors.
  • :brain: YOLOE segmentation training streamlined with a dedicated from‑scratch trainer and cleaner loss logic.
  • :straight_ruler: Export refinements: CUDA‑aware ONNX opset choice, explicit NMS guards, and correct image‑size propagation.
  • :gear: Device compatibility: safer MPS transfers and working RKNN/OpenVINO in Streamlit live inference.
  • :package: Leaner checkpoints: AMP scaler state is stripped for smaller, faster inference weights.
  • :books: Docs refresh: YOLO26 preview and updated YOLO11 FLOPs/params, plus YAML guide fixes.

YOLO11 remains the latest stable and recommended model for all use cases, and you can explore the preview of YOLO26 on the new docs page. For segmentation and open‑vocabulary experiments, the upgraded YOLOE workflows are now cleaner and more torch.compile‑friendly.

New Features

  • YOLOE segmentation training is improved with a dedicated YOLOESegTrainerFromScratch, streamlined preprocessing, and corrected loss initialization for simpler from‑scratch and fine‑tune paths.
  • ONNX export now picks CUDA‑aware opsets and propagates imgsz correctly, reducing runtime mismatches.

Improvements

  • Multi‑GPU training now requires an explicit batch in the Trainer, preventing silent defaults and helping you balance GPU workloads.
  • Safer device transfers on Apple MPS avoid output corruption.
  • Streamlit live inference now supports RKNN/OpenVINO more reliably for Rockchip deployments.
  • Checkpoints have the AMP scaler stripped for lighter inference assets.
  • CI and docs quality improvements, including a wider PyTorch test matrix and updated YOLO11 profiling.

Bug Fixes

  • Fixed CUDA slow tests in CI and --slow handling in pipelines.
  • Resolved AttributeError when validating with visualize=True.
  • Addressed RKNN issues in Streamlit live inference.
  • Ensured imgsz is correctly used for YOLOE visual prompt extraction.

What’s Changed

You can browse the release details on the v8.3.204 release page, and you can compare versions in the full changelog view.

New Contributors

Try it now

Minimal examples:

# Upgrade
pip install -U ultralytics

# Multi-GPU training now requires an explicit batch size
yolo train model=yolo11n.pt data=coco128.yaml imgsz=640 batch=64 device=0,1

# ONNX export with automatic CUDA-aware opset and imgsz propagation
yolo export model=yolo11n.pt format=onnx imgsz=640

Python usage:

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.train(data="coco128.yaml", imgsz=640, batch=64, device="0,1")  # explicit batch for multi-GPU
model.export(format="onnx", imgsz=640)  # cuda-aware opset + imgsz propagation

We’d love your feedback. Please try v8.3.204 in your projects and share issues or suggestions in Discussions and Issues so we can keep polishing the experience for everyone. Thank you to the YOLO community and the Ultralytics team for the continuous contributions and testing!