WARNING ❌️ training failure for hyperparameter tuning

GMOjoe · October 3, 2024, 11:01pm

I am trying to use the YOLO model.tune method to tune hyperparameters using this command from the documentation:

model.tune(data="/path/to/data.yaml", epochs=30, iterations=300, optimizer="AdamW", plots=False, save=False, val=False)

I am currently using google colab and have my runtime connected to a T4 GPU. Using the command in google colab, !nvidia-smi, I can see that the google notebook is connected to the T4 GPU.

The path I am passing to to data="path/to/data.yaml" is taken from the copy path option while left clicking on the document in the document tree in google colab.

The paths in the data.yaml file are correct and also taken from using the copy path option after left clicking on the image folders.

So I believe all my paths are correct, the path for data.yaml file and the paths inside the yaml file pointing to the image folders inside the train, test and validate folders from the downloaded data from Roboflow.
The Tuner output for every iteration shows an WARNING message. But for example in the second iteration it shows different values for the hyperparameters as if the Tuner is attempting or has changed the hyperparameters for a new test. But at each iteration the warning remains. No matter how many epochs are chosen the hyperparameters in the resulting best_hyperparameters.yaml file are always the same as the default values that the tuning process starts with.

Here is the output from the second iteration showing error message as well as some of the hyperparameter values that have been altered from the default starting values as well as the WARNING message that shows on every iteration of tuning:

Tuner: Starting iteration 2/300 with hyperparameters: 
{
    'lr0': 0.01039,
    'lrf': 0.01, 
    'momentum': 0.91248, 
    'weight_decay': 0.0005, 
    'warmup_epochs': 2.83485, 
    'warmup_momentum': 0.8, 
    'box': 7.42245, 
    'cls': 0.5046, 
    'dfl': 1.5, 
    'hsv_h': 0.015, 
    'hsv_s': 0.6979, 
    'hsv_v': 0.3853, 
    'degrees': 0.0, 
    'translate': 0.09441, 
    'scale': 0.48414, 
    'shear': 0.0, 
    'perspective': 0.0, 
    'flipud': 0.0, 
    'fliplr': 0.5, 
    'bgr': 0.0, 
    'mosaic': 1.0, 
    'mixup': 0.0, 
    'copy_paste': 0.0
}

WARNING ❌️ training failure for hyperparameter tuning iteration 2

Command '[
    'yolo', 
    'train', 
    'task=detect', 
    'mode=train', 
    'model=yolo11n.pt', 
    'data=/content/Brazil-Nut-100m-Seg-2/data.yaml', 
    'epochs=30', 
    'time=None', 
    'patience=100', 
    'batch=16', 
    'imgsz=640', 
    'save=False', 
    'save_period=-1', 
    'cache=False', 
    'device=None', 
    'workers=8', 
    'project=None', 
    'name=None', 
    'exist_ok=False', 
    'pretrained=True', 
    'optimizer=AdamW', 
    'verbose=True', 
    'seed=0', 
    'deterministic=True', 
    'single_cls=False', 
    'rect=False', 
    'cos_lr=False', 
    'close_mosaic=10', 
    'resume=False', 
    'amp=True', 
    'fraction=1.0', 
    'profile=False', 
    'freeze=None', 
    'multi_scale=False', 
    'overlap_mask=True', 
    'mask_ratio=4', 
    'dropout=0.0', 
    'val=False', 
    'split=val', 
    'save_json=False', 
    'save_hybrid=False', 
    'conf=None', 
    'iou=0.7', 
    'max_det=300', 
    'half=False', 
    'dnn=False', 
    'plots=False', 
    'source=None', 
    'vid_stride=1', 
    'stream_buffer=False', 
    'visualize=False', 
    'augment=False', 
    'agnostic_nms=False', 
    'classes=None', 
    'retina_masks=False', 
    'embed=None', 
    'show=False', 
    'save_frames=False', 
    'save_txt=False', 
    'save_conf=False', 
    'save_crop=False', 
    'show_labels=True', 
    'show_conf=True', 
    'show_boxes=True', 
    'line_width=None', 
    'format=torchscript', 
    'keras=False', 
    'optimize=False', 
    'int8=False', 
    'dynamic=False', 
    'simplify=True', 
    'opset=None', 
    'workspace=4', 
    'nms=False', 
    'lr0=0.01039', 
    'lrf=0.01', 
    'momentum=0.91248', 
    'weight_decay=0.0005', 
    'warmup_epochs=2.83485', 
    'warmup_momentum=0.8', 
    'warmup_bias_lr=0.1', 
    'box=7.42245', 
    'cls=0.5046', 
    'dfl=1.5', 
    'pose=12.0', 
    'kobj=1.0', 
    'label_smoothing=0.0', 
    'nbs=64', 
    'hsv_h=0.015', 
    'hsv_s=0.6979', 
    'hsv_v=0.3853', 
    'degrees=0.0', 
    'translate=0.09441', 
    'scale=0.48414', 
    'shear=0.0', 
    'perspective=0.0', 
    'flipud=0.0', 
    'fliplr=0.5', 
    'bgr=0.0', 
    'mosaic=1.0', 
    'mixup=0.0', 
    'copy_paste=0.0', 
    'copy_paste_mode=flip', 
    'auto_augment=randaugment', 
    'erasing=0.4', 
    'crop_fraction=1.0', 
    'cfg=None', 
    'tracker=botsort.yaml'
]' returned non-zero exit status 1.

Saved runs/detect/tune/tune_scatter_plots.png
Saved runs/detect/tune/tune_fitness.png

Here is a screenshot of the google colab notebook

Can anyone see why trying to tune hyperparameters for this data is resulting in the best parameters being the starting default values? Is the problem related to the WARNING message? What do I need to look at, change or do differently to properly tune the hyperparameters for my data in the google colab notebook?

pderrenger · October 4, 2024, 12:34am

Hi there!

It sounds like you’re encountering a tricky issue with hyperparameter tuning in YOLO. Let’s see if we can get this sorted out.

Check the Logs: The warning message indicates a failure in the training process. Check the logs for any specific error messages that might provide more insight into what’s going wrong.
Data Paths: Double-check that the paths in your data.yaml file are accessible from your Colab environment. Sometimes, paths can be tricky, especially with cloud-based setups.
Resource Limits: Ensure that your Colab environment has enough resources allocated. Sometimes, memory or other resource constraints can cause training failures.
Hyperparameter Values: If the tuner isn’t finding better hyperparameters, it might be due to a limited search space or constraints. Consider expanding the range of hyperparameters being tested.
Colab Environment: Make sure your Colab environment is up-to-date. You can restart the runtime and reinstall the necessary packages to ensure everything is fresh.
Debugging: Try running a smaller subset of your data to see if the issue persists. This can help isolate whether the problem is with the data or the tuning process.

For further assistance, you might find the YOLO Common Issues Guide helpful. It covers a range of troubleshooting tips that could be relevant.

Feel free to share any additional error messages or details, and we’ll do our best to help you out!

Good luck, and happy tuning!

BurhanQ · October 4, 2024, 10:07pm

@GMOjoe let’s start with an important question, what are you trying to accomplish with hyperparameter tuning?

I’m going to guess that in all likelihood, you’re probably trying to get the best performance out of your trained model. From your screenshot, it looks like you’re using this dataset which only has ~178 original images. Even with the augmentations, the dataset is far too small to worry about hyperparameter tuning. The time you spend trying to figure out how to get tuning working for your situation is going to be better spent on collecting and annotating more data.

The number one thing, ALWAYS, for getting better model performance is to collect and annotate more data. Hyperparameter tuning is something that I’ve only ever seen helpful in a handful of cases (literally can count on one hand), so my recommendation would be to not even worry about tuning and instead collect more data to train with.

GMOjoe · October 5, 2024, 12:26am

Hi Burhan,

Thank your for the reply. You are correct I am trying to increase the performance of a trained model. That is an interesting insight regarding the amount of data needed for hyperparameter tuning to get more data.

You were close in terms of dataset but actually I am using this dataset under the same workspace name. It has 656 annotated images and with augmentations it has 1576 images.

I’m curious at what point do you think that hyperparameter tuning starts to make sense? How much data do you need? Given that I am segmenting a tree species from canopy images taken by a drone and the target trees are similar in color and texture etc. to the rest of the non target trees do you have a sense of how many images/data would be good to have for training a really good model for detection/segmentation? Also, my dataset includes images of the same target trees from different angles to help me create more data, a kind of synthetic data perhaps, do you feel that this is a good way to get more data from one single tree? I ask because I could fly an automated drone mission for each tree that I have identified in the park and generate many more images of the same tree in different angles…do you think this would be helpful for scaling up data?

One more question, I am trying to create a model that NGOs and entities that manages nature reserves or landowners can use to find and identify these trees on their property. My thought was to collect/annotate data all at one altitude and give instructions to NGOs or land owners to fly a drone mission at the same altitude as the trained data to obtain images to run inference on. Does that make sense to you? I know YOLO can detect objects at different distances but I thought regulating the altitude or distance to the target could help in performance so the model only needs to be good at detecting the targets at one distance. It also seems to me that some of the augmentations don’t makes sense in this case like changing images colors or hues etc? Given the goal of segmenting/detecting a single tree species from canopy drone images what augmentations would you recommend we use to assist in scaling up training data? For this project would you recommend not ever bothering with hyperparameter tuning?

You mentioned before that you can only count on one hand when hyperparameter tuning was needed. That is really interesting. What were the situations that came up when you realized that hyperparameter tuning would be helpful?

Again Burhan I appreciate your reply and any insights you can add to this project. I know there was a lot of questions in my response

GMOjoe

GMOjoe · October 5, 2024, 12:36am

Hi pderrenger,

Thank you for your reply. I have yet to figure about why tuning is creating that error.

Interesting though that when i took the same notebook and changed only the parts needed to convert it to work with YOLOv11 instead of YOLOv8 hyperparameter tuning seemed to work as expected. This leads me to believe that its not a problem with paths or colab environment or perhaps even resource limits as the dataset is the same and I am using a google colab environment and the same runtime.

Given this would you recommend just moving on and using YOLOv11 for the project? It is probably best to use the latest release of YOLO anyway?

GMOjoe

Topic		Replies	Views
Unusual behavior in the graphs resulting from model.train YOLO question , code	1	22	January 12, 2025
Custom fitness function for yolo11m-cls hyperparameter tuning YOLO	1	50	February 10, 2025
Very bad validation metrics in custom dataset with fine tuned model YOLO yolov10 , yolo , question , support , code	3	39	March 31, 2025
incremental training yolov8 Support question	1	330	September 16, 2024
Looking for Best Practices for Fine-Tuning YOLOv8 on Custom Dataset? YOLO yolo , question , support	3	406	January 30, 2025

WARNING ❌️ training failure for hyperparameter tuning

Related topics