I have been using Git for years. For code versioning, it’s fantastic. But for data science, it can be tricky if you accidentally add your “big” dataset. I know that Git is not built for data versioning, but why?
Time to learn how Git works. Kabisa has written an excellent introduction for starters. If you’re also curious about how Git works, this is a great place to start:
I even learned a new cool Python trick. Octopus merges where you merge multiple branches into a single commit. Even monster octopus merges occur: https://github.com/torvalds/linux/commit/2cde51fbd0f310c8a2c5f977e665c0ac3945b46d (merging 66 branches)
Now that I know more about Git and that we really can’t solve the Git & data version control problem, it’s time to move on to Data Science Version Control. Let’s see what AskAnna can do about this :)