New Release: Ultralytics v8.3.218

glenn-jocher · October 21, 2025, 1:17pm

Ultralytics v8.3.218 — True multi-GPU validation, contiguous sampler, and accurate cross-GPU metrics

Ultralytics v8.3.218 delivers reliable, faster multi-GPU training. This release enables true multi-GPU validation during training with correct cross-GPU metric aggregation and a new contiguous distributed sampler for stable evaluation. Upgrade with pip install -U ultralytics and enjoy a smoother DDP experience.

Summary

Multi-GPU validation now runs correctly on all ranks with proper aggregation.
New ContiguousDistributedSampler preserves sample order and batch alignment across GPUs.
Cleaner trainer flow and synchronized EMA for consistent metrics.

New Features

Multi-GPU validation during training
- Validation DataLoader and Validator are created on all ranks for proper DDP execution.
- Rank-aware device selection ensures each process validates on its own GPU.
ContiguousDistributedSampler
- Contiguous, batch-aligned chunks per GPU preserve dataset order and determinism.
- Automatically used when shuffle=False (e.g., rect=True or size-grouped evaluation); falls back to PyTorch DistributedSampler when shuffle=True.

Learn more by reviewing the implementing PR in Enable multi-GPU validation during training (#22377) by Y-T-G.

Improvements

Correct cross-GPU metric aggregation
- Validation losses are properly reduced across GPUs.
- Detection and classification validators gather stats from all ranks and compute results on rank 0 only.
- EMA buffers are synchronized from rank 0 to all GPUs for consistent validation.
Trainer flow cleanup
- Validation is executed outside the inner training step for cleaner DDP behavior.
- Final evaluation does only the necessary work on rank 0 with safe synchronization for others.
Documentation
- Reference docs now include ContiguousDistributedSampler.

These changes address reported issues including multi-GPU validation during training, cross-GPU aggregation correctness, and sampler ordering consistency.

Why it matters

More reliable multi-GPU results
- Proper aggregation ensures metrics reflect the full distributed dataset instead of per-rank fragments.
Faster and more stable validation
- Contiguous sampling reduces padding/overhead and improves determinism, especially with rect=True.
Seamless distributed training
- No extra setup required; single-GPU behavior is unchanged.

Quick start (DDP)

CLI (recommended YOLO11):

yolo detect train data=coco128.yaml model=yolo11n.pt devices=0,1,2,3

Python:

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.train(data="coco128.yaml", devices=[0, 1], imgsz=640, epochs=50)

What’s Changed

ultralytics 8.3.218: Multi-GPU validation during training implemented in Enable multi-GPU validation during training (#22377) by Y-T-G.

You can browse the highlights in the v8.3.218 release notes or dive into the details in the full changelog between v8.3.217 and v8.3.218.

Try it and share feedback

Please update, run your multi-GPU workflows, and let us know how it goes. Open a discussion or issue with your findings—your feedback helps the YOLO community and the Ultralytics team keep improving. Happy training and validating across GPUs!

Topic	Replies	Views
New Release: Ultralytics v8.3.128 Discussion releases , announcements , ultralytics-official	60	May 7, 2025
New Release: Ultralytics v8.3.192 Discussion releases , announcements , ultralytics-official	29	September 3, 2025
New Release: Ultralytics v8.3.212 Discussion releases , announcements , ultralytics-official	41	October 12, 2025
New Release: Ultralytics v8.3.229 Discussion releases , announcements , ultralytics-official	40	November 19, 2025
New Release: Ultralytics v8.3.204 Discussion releases , announcements , ultralytics-official	153	October 1, 2025