YOLO does what it learnt to do during training. There’s nothing fixed that it does. It all depends on what it learnt.
You should train the model on images that are closer to what it looks like during inference. If inference is on grayscale, you can train a single channel model. The pretrained weights would largely not be affected.