Optimize GPU utilization while training

Hello,

when I am training a model with Yolo 11 on 2xA100 80GB 2A100.44V 44 CPU•240GB RAM•160GB GPU VRAM with the following parameters: !yolo task=detect mode=train epochs=100 batch=64 plots=True model=‘runs/detect/train3/weights/last.pt’ resume=True data=data.yaml imgsz=640 patience=50 device=0,1 workers=22 rect=True fraction=0.1 cache=True, then I see the following GPU utilization (Screenshot 1).

Screenshot 1 Output:
|Device 0| Mem Free: 74956.75MB / 81920.00MB | gpu-util: 37.0% | gpu-mem: 6.0% |
|Device 1| Mem Free: 80010.75MB / 81920.00MB | gpu-util: 100.0% | gpu-mem: 0.0% |

Is it good to have such high “Mem Free”? Can it be optimized? Is it normal? Should the settings be changed to improve it? Can a weaker (cheaper) or stronger (more expensive) hardware be better?

I am thankful for your opinions and experience.

Thank you very much

You can start training on a single GPU with batch=0.9. It will calculate the appropriate batch size and print that. Then double that for two GPUs.

Hi @Toxite

Thank you for your reply.

I tried it and it returned auto-batch-size of 2386 (Screenshot). So I took this yolo task=detect mode=train epochs=100 batch=2386 …, but it stopped without an error.

I played around with batch-sizes (always a multiple of 64, like 448, 512, 576 up to 960), but always the same picture: the training stopped before starting the first-epoch. Sometimes it made the first epoch, but stopped afterwards.

My working setup is now a single GPU (same as in the original question) with batch-size: 448.
(2x same GPU with same batch-size or slightly-higher batch-size made an epoch not really faster, so I sticked to a single GPU.)

If there are any other setups/configurations to utilize GPU(s) as much as possible, it would be great to know it.

Thank you all

How many images do to have?

Around 220000 images

I would recommend trying the Docker container if you haven’t yet.

Additionally, there’s a hard limit of 1024 for the upper bound of batch size