DVC: A tool for versioning data
DVC (short for “Data Version Control”) is a nifty open-source tool for versioning data alongside an existing Git repository.
I found the setup to be straightforward, especially if you have small datasets that every developer of a repository will need for local development.
DVC creates metadata files for each tracked data file that are versioned in your git repository, although the data files themselves are added to a .gitignore
and backed up in a remote location.
I am using a GCS Bucket as the remote data store, and it was trivial to set up.
I looked at using Google Drive, but as with all things Google Drive it looks like that’s a little janky. Anyway, DVC is a great open-source tool; better then several of the in-house versioning solutions I’ve endured in the past. I even contributed back to the project by fixing three whole typos in the docs!