Ultralytics v8.3.213 β Stability first: automatic NaN recovery, faster Objects365, unified dataset URLs 


A stability-focused release that automatically recovers from NaN/Inf training issues, cleans up resume logic, speeds up Objects365 setup, and unifies dataset download URLs. Upgrade with confidence and keep training moving.
- Version: v8.3.213
- Focus: Training robustness, dataset reliability, and setup speed
- Release: See the full notes on the v8.3.213 release page
TL;DR
- Auto-recovery from NaN/Inf losses and metric collapse during training (DDP-aware, capped retries)

- Safer resumes with centralized checkpoint loading and scheduler reset
- Objects365 preparation runs significantly faster with multithreading

- Unified
ASSETS_URLacross dataset YAMLs for more reliable downloads
Highlights
Training robustness and resume improvements
- Automatically detects NaN/Inf losses and fitness collapse, then safely restores from the latest checkpoint with capped retries (up to 3).
- DDP-aware broadcasting keeps distributed training in sync during recovery.
- Validates checkpoint weights to avoid reloading corrupted EMA states.
- Centralizes checkpoint loading via
_load_checkpoint_state()and uses it inresume_training()for consistency and less drift. - Resets the scheduler state after recovery to maintain the intended LR schedule.
- Includes a new test
test_nan_recoverythat injects a NaN to verify the recovery path.
Details are in the NaN epoch recovery PR (#22352) by Glenn Jocher.
Dataset YAMLs: unified asset URLs
- Replaced hardcoded links with a centralized
ASSETS_URLacross VOC, COCO, COCO-Pose, VisDrone, and LVIS for consistent, maintainable downloads.
See the Use ASSETS_URL in dataset YAMLs PR (#22361) by Glenn Jocher.
Objects365 setup speedups
- Parallelized downloads, file moves, and label generation using
ThreadPoolExecutor. - Increased threads and refactored annotation handling to boost throughput.
See the Improve Objects365.yaml PR (#22362) by Glenn Jocher.
CI and tests
- Temporarily disabled Jetson JetPack 5 Docker build while the new recovery path stabilizes.
- Skips training tests on Jetson/Raspberry Pi to reflect that edge devices are not training targets.
Improvements
- More resilient training at scale with automatic recovery from NaN/Inf losses or sudden metric collapse.
- Safer, more consistent resume logic for optimizer, scaler, EMA, and best-fitness states.
- Faster dataset preparation, especially for Objects365, thanks to multithreading.
- More reliable dataset downloads via centralized hosting and unified URLs.
Bug Fixes
- Prevents reloading corrupted EMA states by validating checkpoint weights during recovery.
- Avoids scheduler desynchronization by resetting LR scheduler after recovery.
Quick start
- Upgrade:
pip install -U ultralytics - Train (recovery is automatic; no extra flags needed):
yolo detect train model=yolo11n.pt data=coco128.yaml epochs=100 imgsz=640 - Resume anytime:
yolo detect train resume=True
Learn more about training options in the Train mode documentation.
Model guidance
- YOLO11 is our latest stable and recommended default for all tasks; explore the YOLO11 model docs.
- Community models YOLO12 and YOLO13 are not recommended; see notes in the YOLO12 model docs.
- Ultralytics R&D for YOLO26 is underway; follow the YOLO26 R&D preview.
Whatβs changed (PRs and authors)
- Unified dataset URLs with
ASSETS_URLin YAMLs, see the dataset URL unification PR (#22361) by Glenn Jocher. - Objects365 preparation speedups, see the Objects365 improvements PR (#22362) by Glenn Jocher.
- Automatic NaN epoch recovery and resume logic cleanup, see the NaN epoch recovery PR (#22352) by Glenn Jocher.
Review the complete changelog for v8.3.212 β v8.3.213.
Try it and share feedback
Please upgrade, put the new recovery path through its paces, and let us know how it performs in your workflows. You can start a conversation in Ultralytics GitHub Discussions or open issues with reproducible examples. Your feedback helps the YOLO community and Ultralytics team keep improving.