What's wrong with my training process?

Issue: Model Performs Well During Training but Poorly on Evaluation

Problem Description

During the training process, the model shows good performance metrics in the training logs, but during final evaluation, it performs almost completely incorrectly, as if it hasn’t been trained at all.

Training Logs (Last 3 Epochs):

epoch time train/box_loss train/cls_loss train/dfl_loss metrics/precision(B) metrics/recall(B) metrics/mAP50(B) metrics/mAP50-95(B) val/box_loss val/cls_loss val/dfl_loss lr/pg0 lr/pg1 lr/pg2
198 49616.6 0.69604 0.47192 0.88007 0.83006 0.72811 0.81305 0.6184 0.70753 0.48988 0.89207 0.000105495 0.000105495 0.000105495
199 49860.6 0.69621 0.47175 0.88079 0.83219 0.72686 0.81298 0.618 0.7076 0.48996 0.89208 0.000102443 0.000102443 0.000102443
200 50104.7 0.69451 0.47151 0.87952 0.83122 0.72626 0.81188 0.61713 0.70762 0.48988 0.89205 0.000100611 0.000100611 0.000100611

Training Code:


from datetime import datetime
from ultralytics import YOLO
model_yaml_path = "Custom_Model_cfg/yolo11_Modify.yaml"
data = "Custom_dataset_cfg/vehicle_orientation.yaml"
if __name__ == '__main__':

    model = YOLO(model_yaml_path)

    results = model.train(data=data,
                          epochs=200,
                          batch=32,
                          imgsz=640,
                          cos_lr=True,
                          close_mosaic=50,
                          save=True,
                          device="0",
                          name="yolo11_Modify"+datetime.now().strftime("%Y%m%d_%H_%M"))

when Traning finish ,it is result

Additional Testing

I suspected that the model might be performing well on the training set but poorly on the test set, so I created a new dataset by sampling 1000 images from the training set and evaluated it using the best trained model. The results were equally poor.

Class Images Instances Box (P) Box (R) mAP50 mAP50-95
all 600 2956 0.000716 0.00257 0.000304 7.43e-05
car 581 2431 0.00287 0.0103 0.000798 0.000211
motorcycle 23 27 0 0 0.000298 7.45e-05
bus 34 40 0 0 0 0
truck 275 458 0 0 0.00012 1.2e-05

val code

from ultralytics import YOLO
#
# # Load a model
model_path=r"C:\Users\Hunger\Desktop\ultralytics\runs\detect\yolo11_Modify\weights\last.pt"
data=r"Custom_dataset_cfg/test.yaml"


if __name__ == '__main__':
    model=YOLO(model_path)
    # Validate the model
    metrics = model.val(data=data)  # no arguments needed, dataset and settings remembered
    metrics.box.map  # map50-95
    metrics.box.map50  # map50
    metrics.box.map75  # map75
    metrics.box.maps  # a list contains map50-95 of each category

Probably something wrong with your modification. Did you train with amp=False?

1 Like

thank you for reply :heart_hands:,but AMP is default value “True”:
here is total tranning argument:

task: detect
mode: train
model: Custom_Model_cfg/yolo11_Modify.yaml
data: Custom_dataset_cfg/vehicle_orientation.yaml
epochs: 200
time: null
patience: 100
batch: 32
imgsz: 640
save: true
save_period: 1
cache: false
device: '0'
workers: 8
project: null
name: yolo11_Modify20251101_18_21
exist_ok: false
pretrained: true
optimizer: auto
verbose: true
seed: 0
deterministic: true
single_cls: false
rect: false
cos_lr: true
close_mosaic: 50
resume: false
amp: true
fraction: 1.0
profile: false
freeze: null
multi_scale: false
compile: false
overlap_mask: true
mask_ratio: 4
dropout: 0.0
val: true
split: val
save_json: false
conf: null
iou: 0.7
max_det: 300
half: false
dnn: false
plots: true
source: null
vid_stride: 1
stream_buffer: false
visualize: false
augment: false
agnostic_nms: false
classes: null
retina_masks: false
embed: null
show: false
save_frames: false
save_txt: false
save_conf: false
save_crop: false
show_labels: true
show_conf: true
show_boxes: true
line_width: null
format: torchscript
keras: false
optimize: false
int8: false
dynamic: false
simplify: true
opset: null
workspace: null
nms: false
lr0: 0.01
lrf: 0.01
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 7.5
cls: 0.5
dfl: 1.5
pose: 12.0
kobj: 1.0
nbs: 64
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 0.0
translate: 0.1
scale: 0.5
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.5
bgr: 0.0
mosaic: 1.0
mixup: 0.0
cutmix: 0.0
copy_paste: 0.0
copy_paste_mode: flip
auto_augment: randaugment
erasing: 0.4
cfg: null
tracker: botsort.yaml
save_dir: C:\Users\Hunger\Desktop\ultralytics\runs\detect\yolo11_Modify

You can try disabling fuse before running validation:

model.model.is_fused = lambda: True

You’re right! Thanks a lot for pointing that out. :heart_hands:
I just realized one of my custom modules had a wrong fuse() implementation.

Great catch, and thanks for closing the loop! A broken fuse() will tank eval because YOLO11 fuses layers before validation. When you fix your custom block, mirror the pattern used in core modules: fold BN into Conv, delete the BN, and switch the forward to the fused path. You can see how we do it in the model-level fuse flow in the Model.fuse reference and the BaseModel.fuse reference, plus a concrete example in the RepVGGDW.fuse example.

Quick sanity check you can keep in your tests: predictions should be numerically close before vs after fuse.

from ultralytics import YOLO
import numpy as np

m = YOLO("path/to/your.pt")
img = "ultralytics/assets/bus.jpg"

r0 = m.predict(img, verbose=False)
m.model.fuse()  # fused inference path
r1 = m.predict(img, verbose=False)

d = np.abs(r0[0].boxes.xyxy.cpu().numpy() - r1[0].boxes.xyxy.cpu().numpy()).mean()
print(f"Mean abs box diff (unfused vs fused): {d:.4f}")  # should be very small

Until your fuse is fixed, your workaround model.model.is_fused = lambda: True before val() is fine. If anything else pops up, update to the latest Ultralytics package and share a minimal repro—happy to take a look.

References:

Thank you so much for your detailed guidance and invaluable suggestions! :heart_hands: