How to Handle Big Repositories with Git?
Git is a free and open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git relies on the basis of distributed development of software where more than one developer may have access to the source code of a specific application and can modify changes to it that may be seen by other developers. In this article, we will learn how to handle big repositories with Git. There are two types of big repositories:
- One with a large commit history
- Another with a large number of binary files
Handling repositories with large commit history:
- Use shallow clone
- Using git-filter
- Cloning a single branch
1. Using the shallow clone
This is a comparatively fast solution where we pull down only the latest commits of the repo’s history. Imagine I have a repository with 1 GB of data with more than 35000+ commits. If I choose full cloning this repository, it’s general it will take a large amount of time, but if we choose to pull only the latest n commits it can reduce our time exponentially. To perform shallow cloning we need to add –depth command with our clone command
git clone --depth [n] [url] Here n specifies number of latest n commits url specifies the remote url of the repository
2. Using git-filter
Here we can walk through the entire project history, modify, filter, or skip according to our necessity. This is generally used when we do have a large number of binary files and we need only some. To use git-filter we use the following command:
git filter-branch --tree-filter 'rm -rf [path-to-asset]' path-to-asset signifies the path to binary asset in your repository
Although powerful, it comes with its own shortcoming that whenever we do git-filter, it changes the ids of the commit which will further require recloning. Therefore required care of recloning must be taken while using git-filter
3. Cloning a single branch
This technique is useful when we do have multiple branches but we want to work with some of them. To clone a single branch, we can use the following command:
git clone [url] --branch [branch_name] --single-branch url specifies the remote url of the repository branch_name specifies the name of the branch you want to clone
Handling repositories with a large number of binary files:
- We can use submodules, i.e. repository inside another repository. The inside repository will contain all the binary files which will provide us modularity since it will keep parent code separately and if in the future we want to make changes in this sub-module it will not affect the parent code repository.
- We can use third-party extensions like Git LFS, a Git extension used to manage large files and binary files in a separate Git repository.
- We can use garbage collection git-gc which does turn several loose objects into a single file.
Out of all the above three solutions, using third-party extensions like Git LFS is the most recommended.