Breaking Reproducibility Due to In-place Data Patching

I’m training Yolo11x on a few thousand images (JPG). During each training run, I see warnings about some files being corrupt and then being fixed. The “corrupt” files open just fine so I am not sure what is being “fixed”. However, this in-place fixing breaks experiment reproducibility.


val: /mnt/devel/project/data/yolo/internal/1/images/val/I63273_I542557_039m40r3164.jpg: corrupt JPEG restored and saved

The “corrupt” files open just fine so I am not sure what is being “fixed”.

It would have replaced the original file with the fixed version, so it ought to open fine after that. The corruption it is fixing is truncated or partially downloaded JPEG files:

However, this in-place fixing breaks experiment reproducibility.

Are you sure that it’s due to this? It shouldn’t affect reproducibility because the fix occurs before the image is passed for training. Is your dataset being redownloaded every time? Or is it on a remote mounted drive? It shouldn’t be getting different images corrupted every training run, unless you’re redownloading it or it’s on a network drive.

So the dataset, being a dependency, is dvc versioned. dvc detects that the files are changed but can’t tell what has changed.

Are the same images appearing as corrupted every time?

I haven’t checked this specifically but I’d assume that this is the case since even after I commit the fixed images to dvc, I get error late on.

What’s your training command?

To me, it seems unlikely that it’s the same image being repaired each time. The repair would fix the image, and save it, so the same image isn’t going appear corrupted again. Unless dvc is restoring it to the old version and corrupting it again. Also you didn’t answer my questions.

Are you sure that it’s due to this? It shouldn’t affect reproducibility because the fix occurs before the image is passed for training. Is your dataset being redownloaded every time? Or is it on a remote mounted drive? It shouldn’t be getting different images corrupted every training run, unless you’re redownloading it or it’s on a network drive.