I just tried uploading some images that already exist in the dataset, and they were saved as a different version without any warnings. I anticipate this will cause data duplication.
Is it possible to add a duplication check feature based on filename and a user-selectable validation method, such as: Overwrite / Skip / Save as a different name / Cancel
Also, it would be great if the duplication detection could be based on comparing image content or some more accurate method than just relying on filenames. However, I’m not sure about the feasibility of programming this functionality. Hope you guys can consider this.
Thanks — that’s a very reasonable feature request.
Right now, Ultralytics Platform doesn’t show an upload-time duplicate warning at the dataset UI level. It does already use hash-based, content-addressable storage for backend deduplication, as noted in the dataset docs, but that’s mainly for storage efficiency and doesn’t currently act like a “duplicate file conflict” check inside a dataset.
So yes, filename-only checks would help, but content-based duplicate detection is the better approach. Options like Overwrite, Skip, Save as a different name, and Cancel all make sense here. I’ll pass this along to the team.
For now, the best workaround is to review in Table view and remove extras using the bulk delete flow.