Train a detection YOLO with a fourth input "depth"

Hey guys,
Before I go forking and modifying the code to add this feature -

is there any option to add a fourth input, the depth value, per each pixel/object in training a detection YOLO?

Has it ever been implemented before?

Example of the depth values of an image:

Thanks :slight_smile:

Ultralytics only supports 3 channels by default. So you would have to modify it (probably to a significant degree especially when it comes to augmentations) to get it to work with 4 channels.

2 Likes

Thank you.
Iโ€™m willing to give it a try, as Iโ€™m enthusiastic about learning about the framework and the under-the-hood technology through action.

If youโ€™re down to guide me a bit on what code segments of the pipeline should be modified, Iโ€™ll be more than happy to ask from time to time about the steps I should go through to ensure the modifications are complete.
In the meantime, Iโ€™ll try mostly to chat with GPT to get a hold of whatโ€™s going on there.