Hi,
My task is object detection on 1 class, using grayscale images (created from annotated videos). I have ~400k images in my dataset.
My YOLOv8m model converge after few I tried to adjust lr0, and lrf (lr0=0.00125-lrf=0.00000625
) to really low values, as well as cos_lr, Adam optimizer, but it looks like it does not help.
From what I read it usually takes ~100-300 epochs to converge.
Here are my training parameters
Ultralytics YOLOv8.0.55 π Python-3.8.5 torch-2.0.0+cu117 CUDA:0 (NVIDIA RTX A5000, 24114MiB)`
yolo/engine/trainer: task=detect, mode=train,` [`model=yolov8m.pt`](http://model=yolov8m.pt)`, data=/workspace/ultralytics/compile_yolov8m/my_ds.yaml, epochs=100, patience=50, batch=64, imgsz=(640, 480), save=True, save_period=1, cache=False, device=0, workers=8, project=yolov8m_drone, name=retrain_yolov8m_30-lr0=0.00125-lrf=0.00000625-warmup_epochs=7-epochs=100-batch=64-cos_lr=Default, exist_ok=False, pretrained=True, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.00125, lrf=6.25e-06, momentum=0.937, weight_decay=0.0005, warmup_epochs=7, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64, hsv_h=0, hsv_s=0, hsv_v=0, degrees=10, translate=0.1, scale=0.3, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=yolov8m/retrain_yolov8m_30-lr0=0.00125-lrf=0.00000625-warmup_epochs=7-epochs=100-batch=64-cos_lr=Default
Overriding model.yaml nc=80 with nc=1
- What else can I do to improve my model?
- Can you suggest a strategy to push it as accurate as possible?
- Does this behavior normal or it might suggest problem with the data, etc.? If so, can you provide a strategy to fix it?
- Also note that I load the grayscale images with RGB (due to YOLO architecture), would you recommend changing it? If so, how?
For reference thatβs my CLI output (note that it over fit at the very first epochs):
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/100 23.7G 1.91 1.213 0.8506 26 640: 100%|ββββββββββ| 5524/5524 [58:01<00:00, 1.59it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [02:26<00:00, 1.93it/s]
all 36298 25870 0.778 0.62 0.636 0.224
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/100 23.1G 1.648 0.7622 0.821 25 640: 100%|ββββββββββ| 5524/5524 [57:48<00:00, 1.59it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [02:03<00:00, 2.30it/s]
all 36298 25870 0.843 0.624 0.677 0.243
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/100 23.1G 1.575 0.7043 0.8166 30 640: 100%|ββββββββββ| 5524/5524 [57:55<00:00, 1.59it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [02:37<00:00, 1.80it/s]
all 36298 25870 0.846 0.6 0.661 0.226
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/100 23.1G 1.522 0.6724 0.8135 19 640: 100%|ββββββββββ| 5524/5524 [59:55<00:00, 1.54it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [02:02<00:00, 2.33it/s]
all 36298 25870 0.85 0.6 0.66 0.229
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/100 23.1G 1.496 0.6596 0.8127 29 640: 100%|ββββββββββ| 5524/5524 [57:05<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [02:00<00:00, 2.35it/s]
all 36298 25870 0.86 0.598 0.672 0.247
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/100 23.1G 1.489 0.6603 0.8128 27 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:58<00:00, 2.40it/s]
all 36298 25870 0.893 0.562 0.679 0.272
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/100 23.1G 1.496 0.6662 0.8148 26 640: 100%|ββββββββββ| 5524/5524 [57:09<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:56<00:00, 2.44it/s]
all 36298 25870 0.885 0.504 0.656 0.293
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/100 23.2G 1.478 0.659 0.8132 23 640: 100%|ββββββββββ| 5524/5524 [57:06<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.47it/s]
all 36298 25870 0.878 0.461 0.632 0.294
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/100 23.2G 1.437 0.6347 0.8096 24 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.48it/s]
all 36298 25870 0.901 0.456 0.623 0.276
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/100 23.2G 1.404 0.6174 0.8066 29 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.938 0.463 0.629 0.256
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
11/100 23.2G 1.377 0.6024 0.8047 29 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:53<00:00, 2.50it/s]
all 36298 25870 0.931 0.466 0.63 0.236
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
12/100 23.2G 1.357 0.5903 0.8028 31 640: 100%|ββββββββββ| 5524/5524 [57:07<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.919 0.486 0.636 0.229
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
13/100 23.2G 1.338 0.5813 0.8007 23 640: 100%|ββββββββββ| 5524/5524 [57:09<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.909 0.494 0.63 0.219
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
14/100 23.2G 1.319 0.5721 0.8001 28 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:53<00:00, 2.49it/s]
all 36298 25870 0.898 0.489 0.621 0.213
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
15/100 23.2G 1.306 0.5656 0.799 26 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:53<00:00, 2.50it/s]
all 36298 25870 0.887 0.494 0.614 0.204
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
16/100 23.2G 1.292 0.559 0.7979 23 640: 100%|ββββββββββ| 5524/5524 [56:59<00:00, 1.62it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:53<00:00, 2.51it/s]
all 36298 25870 0.863 0.485 0.595 0.189
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
17/100 23.2G 1.279 0.5525 0.7967 26 640: 100%|ββββββββββ| 5524/5524 [57:00<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.847 0.476 0.578 0.184
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
18/100 23.2G 1.267 0.5457 0.7958 28 640: 100%|ββββββββββ| 5524/5524 [57:06<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.84 0.478 0.567 0.181
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
19/100 23.2G 1.257 0.5412 0.795 24 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.848 0.475 0.57 0.183
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
20/100 23.2G 1.248 0.5377 0.7947 31 640: 100%|ββββββββββ| 5524/5524 [57:07<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:53<00:00, 2.50it/s]
all 36298 25870 0.845 0.464 0.555 0.171
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
21/100 23.2G 1.24 0.5331 0.7939 33 640: 100%|ββββββββββ| 5524/5524 [57:08<00:00, 1.61it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.828 0.464 0.551 0.171
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
22/100 23.2G 1.231 0.528 0.7939 25 640: 100%|ββββββββββ| 5524/5524 [57:57<00:00, 1.59it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.48it/s]
all 36298 25870 0.814 0.456 0.546 0.171
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
23/100 23.2G 1.222 0.5248 0.7928 30 640: 100%|ββββββββββ| 5524/5524 [57:27<00:00, 1.60it/s]s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 284/284 [01:54<00:00, 2.49it/s]
all 36298 25870 0.794 0.454 0.543 0.172