Hi, I want to do object detection using YOLO. I have some real confusion regarding the usage of yaml and .pt file. Kindly reply the following questions,
I have changed the architecture of yolov11 and I want to compare with unmodified version. I know I have to build the model with yaml file for modified version but for comparison shall I also build the model with unmodified yaml file or use .pt file instead?
If I use .pt file and then compare it with the unmodified version of yaml file the results are drastically different. Which is the correct way to guage the architecture change?
You can use the YAML file for comparison when youâre training on the same data from scratch.
Really the same kind of question as above.
You will likely need to train your custom model from scratch since youâre using a custom YAML file. This means you can just load the default YOLO11 model also using the YAML config.
from ultralytics import YOLO
custom = YOLO("custom_model.yaml", task="detect")
standard = YOLO("yolo11.yaml")
custom_result = custom.train(data="custom_dataset.yaml")
standard_result = standard.train(data="custom_dataset.yaml")
That means for comparison also I build the model using yaml file.
But theoretically speaking for custom dataset building a model from yaml file and using pt shouldnât be same if there is no modification in the architecture? If thatâs true why am I getting drastically different results?
A custom model from would be the same after training the model for either the .pt file or the YAML file. When you build your custom model from the YAML, the weights are randomized and the model is not trained, when building from the .pt file (assuming it was trained), then it will have weights updated based on the training data. The same goes for the standard YOLO11 models.
As per your statement above " A custom model from would be the same after training the model for either the .pt file or the YAML file" they should have same results. but here is what I am getting. These are validation results.
the run with yaml file has following results
YOLO11 summary (fused): 238 layers, 2,582,737 parameters, 0 gradients, 6.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ââââââââââ| 1/1 [00:00<00:00, 1.50it/s]
all 20 65 0.801 0.415 0.457 0.257
head 3 18 0.732 0.611 0.687 0.41
helmet 17 45 0.671 0.635 0.649 0.345
person 1 2 1 0 0.037 0.0148
Also the auto-optimizer settings are as follows.
yaml run:
optimizer: AdamW(lr=0.001429, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
YOLO11m summary (fused): 303 layers, 20,032,345 parameters, 0 gradients, 67.7 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ââââââââââ| 1/1 [00:09<00:00, 9.38s/it]
all 20 65 0.898 0.51 0.805 0.476
head 3 18 0.761 0.709 0.788 0.445
helmet 17 45 0.933 0.822 0.879 0.533
person 1 2 1 0 0.75 0.45
And the auto-optimizer settings are as follows
optimizer: AdamW(lr=0.001429, momentum=0.9) with parameter groups 106 weight(decay=0.0), 113 weight(decay=0.0005), 112 bias(decay=0.0)
Can you please explain why is this difference there?
Also, which is the best way to run?
With which method shall I compare the architecture modification?
Posting screenshots of code instead of pasting the code makes everything hard to read.
Just post the code directly with proper formatting.
And the reason for the performance discrepancy is because youâre not specifying the scale while loading the yaml. It should be yolo11m.yaml. Thereâs also a warning that says you that scale wasnât passed and it would be defaulting to n.
Also you donât need to pass the full path to the yolo11 yaml. Just pass yolo11m.yaml. Ultralytics will automatically use the stock yaml.
Noted with thanks. here are the codes.
The 1st one is utilizes the yaml file.
import os
if name == âmainâ:
# 1. Optional memory config for CUDA
os.environ[âPYTORCH_CUDA_ALLOC_CONFâ] = âexpandable_segments:Trueâ
import torch
from ultralytics import YOLO
from ultralytics.nn.tasks import yaml_model_load
# ----------------------------------------------------------------------
# USER-CONFIGURABLE PATHS
# ----------------------------------------------------------------------
# Path to your custom dataset YAML
data_yaml_path = r"D:/ultralytics/codes/Hard Hat/data.yaml"
# Path to your YOLOv11 YAML
model_yaml_path = r"D:/Python New/Python312/Lib/site-packages/ultralytics/cfg/models/11/yolo11.yaml"
# Optional .pt weights
pretrained_weights = r"D:/Kai work/data_scene_flow/python files/yolo11m.pt"
# ----------------------------------------------------------------------
# CHECK GPU / CPU
# ----------------------------------------------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}\n")
# ----------------------------------------------------------------------
# DEBUG: Print out the custom yolo11.yaml
# ----------------------------------------------------------------------
print("Loading the model YAML for debugging...")
model_yaml = yaml_model_load(model_yaml_path)
print(f"Loaded Model YAML:\n{model_yaml}")
# ----------------------------------------------------------------------
# BUILD MODEL FROM YAML + OPTIONALLY LOAD PRETRAINED WEIGHTS
# ----------------------------------------------------------------------
print("\nInitializing YOLO model from YAML...")
model = YOLO(model_yaml_path) # Build from your updated yolo11.yaml
# Optionally load the pretrained weights into the architecture
if pretrained_weights:
print(f"Loading pretrained weights from {pretrained_weights}...")
model.load(pretrained_weights)
# ----------------------------------------------------------------------
# CLEAR UNUSED VRAM
# ----------------------------------------------------------------------
torch.cuda.empty_cache()
# ----------------------------------------------------------------------
# TRAIN
# ----------------------------------------------------------------------
print("\nStarting training...")
model.train(
data=data_yaml_path, # dataset config
epochs=700, # or however many you like
batch=16, # adjust to fit your GPU memory
imgsz=640, # image size
device=device,
workers=0, # can increase for multi-CPU data loading
cache=False, # no caching for quick tests
amp=True, # automatic mixed precision
name="custom_yolov11_trial_model"
)
print("Training complete.")
And below is the code which uses .pt file
import os
import torch
from ultralytics import YOLO
from ultralytics.nn.tasks import yaml_model_load
if name == âmainâ:
os.environ[âPYTORCH_CUDA_ALLOC_CONFâ] = âexpandable_segments:Trueâ
# Paths
data_yaml_path = r"D:/ultralytics/codes/Hard Hat/data.yaml"
# For baseline, use the standard model configuration or pretrained checkpoint
baseline_model_checkpoint = r"D:/Kai work/data_scene_flow/python files/yolo11m.pt" # standard pretrained checkpoint
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Initialize baseline model directly from the checkpoint
print("\nInitializing baseline YOLO model...")
model = YOLO(baseline_model_checkpoint)
torch.cuda.empty_cache()
# Train the baseline model
print("\nStarting training for baseline model...")
model.train(
data=data_yaml_path,
epochs=700, # same as custom experiment
batch=16,
imgsz=640,
device=device,
workers=0,
cache=False,
amp=True,
name="baseline_yolov11_model"
)
print("Training complete for baseline model.")
As Toxite mentioned, without a scale parameter, the models are not comparable. This is apparent when looking at the difference in the parameter counts.
That said, meant that the custom model YAML and your custom modified model weights are supposed to be the same. In general, one should not expect a custom model structure to match the Ultralytics pretrained model performance.
Thanks a lot Toxite and BurhanQâŚI got the point.
I corrected it and got the same results now.
If I face any further issues I will ask you guys again.