Ultralytics v8.3.196 is out
— faster training with torch.compile, smoother IO, and sturdier exports
Short version: v8.3.196 brings optional torch.compile
acceleration across train/val/predict for up to ~30% faster runs, plus dataloader throughput boosts, unified device transfers, more robust CoreML exports, and smoother plotting/config handling.
YOLO11 remains our recommended default for all use cases.
Summary
- Up to ~30% faster training and snappier inference thanks to opt-in
torch.compile
across CLI and Python. - Faster, safer data loading with higher prefetch and smarter batch handling.
- Centralized device transfers reduce CPU/GPU mismatch issues across tasks.
- CoreML export pipeline cleaned up for more reliable deployments on Apple platforms.
- Plotting and configuration paths are more CI- and container-friendly.
You can review the release details on the official release page in the Ultralytics v8.3.196 notes, and see every change in the full changelog diff from v8.3.195.
New Features
- torch.compile acceleration (primary)
- New
compile
arg available intrain
,val
, andpredict
via CLI, config, and Python API. - New helpers make adoption safe and granular, including
attempt_compile(...)
to enable compilation when supported anddisable_dynamo(...)
to exclude sensitive paths. - End-to-end integrations:
- Trainer compiles models after loss initialization, marks dynamic tensors for stability, and unwraps models for EMA/checkpointing.
- Validator can compile for standalone validation; training’s final eval skips compile for stability/speed.
- Predictor supports
compile=True
for accelerated inference on CUDA/CPU/MPS when supported.
- Utility rename:
de_parallel
➝unwrap_model
to handle both parallel and compiled models consistently.
- New
- Documentation updated to reflect the new
compile
argument and torch utility functions.
Improvements
- Faster data loading
- Default
prefetch_factor
doubled to 4 whennum_workers > 0
, and gracefully omitted on older PyTorch releases to avoid errors. - Safer
drop_last
behavior with compile-enabled training to improve shape stability.
- Default
- Unified device handling
- Centralized logic for moving batch tensors to the correct device across detection, pose, segmentation, and YOLOE reduces code duplication and “tensor on CPU vs GPU” issues.
- CoreML export robustness
- Cleaned up NMS pipeline with direct use of spec outputs, explicit shapes where needed, consistent IO names, and simpler wiring for more reliable exports across macOS/Linux/Windows.
- Plotting stability
@plt_settings()
wrapsfeature_visualization(...)
for backend-safe, non-blocking plots in headless/CI environments.
- Config directory resolution
- Smarter
get_user_config_dir()
honorsYOLO_CONFIG_DIR
, follows OS conventions (XDG on Linux), and falls back to writable paths like/tmp
when needed.
- Smarter
- Compatibility and CI polish
- TorchVision compatibility matrix updated for PyTorch 2.8/0.23 and 2.9/0.24.
- GitHub Actions bumped for
setup-python
v6 andactions/stale
v10.
Bug Fixes
- Fixed missing tensors moved to device in Trainer and Validator
preprocess_batch
methods. - Reduced overly verbose user config directory checks.
- Corrected a minor typo in a deprecation warning.
Quick start with compile
CLI example:
yolo train model=yolo11n.pt data=coco8.yaml epochs=100 compile=True
Python example:
from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.train(data="coco8.yaml", epochs=100, compile=True)
# Also supported:
model.val(compile=True)
model.predict("img.jpg", compile=True)
Tip: torch.compile
works best on recent PyTorch (2.0+). If your stack or device doesn’t support it, leave compile=False
(default).
Upgrade
Install or upgrade with:
pip install -U ultralytics
For guidance on tasks and usage, the YOLO11 documentation and mode guides for Train, Val, and Predict are great starting points.
PRs included in v8.3.196
- Add @plt_settings() decorator to feature_visualization() by Glenn Jocher
- Cleanup CoreML NMS pipeline code by Y-T-G
- Double default Dataloader prefetch_factor to 4 by Glenn Jocher
- Update TorchVision compat matrix with 2.8 and 2.9 by Glenn Jocher
- Fix overly verbose USER_CONFIG_DIR checks by Glenn Jocher
- Fix missing tensors on device in preprocess_batch by Glenn Jocher
- Bump actions/setup-python from 5 to 6 by Dependabot
- Bump actions/stale from 9 to 10 by Dependabot
- Fix typo in deprecation_warn by Rizwan Munawar
- ultralytics 8.3.196 torch.compile acceleration by Glenn Jocher
You can browse all differences in the full changelog compare view and read the highlights on the v8.3.196 release page.
Thanks and feedback
Big thanks to everyone in the YOLO community and the Ultralytics team for ideas, testing, and contributions. We’d love your feedback—please share results, questions, or issues by opening an Ultralytics GitHub issue or joining the discussion in Ultralytics Discussions.
Happy training with YOLO11 and enjoy the speedups!