New Release: Ultralytics v8.3.196

Ultralytics v8.3.196 is out :tada: — faster training with torch.compile, smoother IO, and sturdier exports

Short version: v8.3.196 brings optional torch.compile acceleration across train/val/predict for up to ~30% faster runs, plus dataloader throughput boosts, unified device transfers, more robust CoreML exports, and smoother plotting/config handling. :high_voltage:

YOLO11 remains our recommended default for all use cases.

:glowing_star: Summary

  • Up to ~30% faster training and snappier inference thanks to opt-in torch.compile across CLI and Python.
  • Faster, safer data loading with higher prefetch and smarter batch handling.
  • Centralized device transfers reduce CPU/GPU mismatch issues across tasks.
  • CoreML export pipeline cleaned up for more reliable deployments on Apple platforms.
  • Plotting and configuration paths are more CI- and container-friendly.

You can review the release details on the official release page in the Ultralytics v8.3.196 notes, and see every change in the full changelog diff from v8.3.195.

:rocket: New Features

  • torch.compile acceleration (primary)
    • New compile arg available in train, val, and predict via CLI, config, and Python API.
    • New helpers make adoption safe and granular, including attempt_compile(...) to enable compilation when supported and disable_dynamo(...) to exclude sensitive paths.
    • End-to-end integrations:
      • Trainer compiles models after loss initialization, marks dynamic tensors for stability, and unwraps models for EMA/checkpointing.
      • Validator can compile for standalone validation; training’s final eval skips compile for stability/speed.
      • Predictor supports compile=True for accelerated inference on CUDA/CPU/MPS when supported.
    • Utility rename: de_parallelunwrap_model to handle both parallel and compiled models consistently.
  • Documentation updated to reflect the new compile argument and torch utility functions.

:hammer_and_wrench: Improvements

  • Faster data loading
    • Default prefetch_factor doubled to 4 when num_workers > 0, and gracefully omitted on older PyTorch releases to avoid errors.
    • Safer drop_last behavior with compile-enabled training to improve shape stability.
  • Unified device handling
    • Centralized logic for moving batch tensors to the correct device across detection, pose, segmentation, and YOLOE reduces code duplication and “tensor on CPU vs GPU” issues.
  • CoreML export robustness
    • Cleaned up NMS pipeline with direct use of spec outputs, explicit shapes where needed, consistent IO names, and simpler wiring for more reliable exports across macOS/Linux/Windows.
  • Plotting stability
    • @plt_settings() wraps feature_visualization(...) for backend-safe, non-blocking plots in headless/CI environments.
  • Config directory resolution
    • Smarter get_user_config_dir() honors YOLO_CONFIG_DIR, follows OS conventions (XDG on Linux), and falls back to writable paths like /tmp when needed.
  • Compatibility and CI polish
    • TorchVision compatibility matrix updated for PyTorch 2.8/0.23 and 2.9/0.24.
    • GitHub Actions bumped for setup-python v6 and actions/stale v10.

:bug: Bug Fixes

  • Fixed missing tensors moved to device in Trainer and Validator preprocess_batch methods.
  • Reduced overly verbose user config directory checks.
  • Corrected a minor typo in a deprecation warning.

:high_voltage: Quick start with compile

CLI example:

yolo train model=yolo11n.pt data=coco8.yaml epochs=100 compile=True

Python example:

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.train(data="coco8.yaml", epochs=100, compile=True)

# Also supported:
model.val(compile=True)
model.predict("img.jpg", compile=True)

Tip: torch.compile works best on recent PyTorch (2.0+). If your stack or device doesn’t support it, leave compile=False (default).

:package: Upgrade

Install or upgrade with:

pip install -U ultralytics

For guidance on tasks and usage, the YOLO11 documentation and mode guides for Train, Val, and Predict are great starting points.

:white_check_mark: PRs included in v8.3.196

You can browse all differences in the full changelog compare view and read the highlights on the v8.3.196 release page.

:raising_hands: Thanks and feedback

Big thanks to everyone in the YOLO community and the Ultralytics team for ideas, testing, and contributions. We’d love your feedback—please share results, questions, or issues by opening an Ultralytics GitHub issue or joining the discussion in Ultralytics Discussions.

Happy training with YOLO11 and enjoy the speedups! :rocket: