Programmers often reuse code. In fact, it’s one of the core principles of any good codebase—Don’t Repeat Yourself (DRY). What if you want to use a shared project inside multiple other repositories? The
git subtree command can help manage that.
The Problem of Code Reuse
Embedding projects inside other projects presents an issue; Project 1 and Project 2 are both connected to Git, but directly using the shared subproject for both of them isn’t a good design choice. This is essentially forking the subproject to use it in two places, and it’s going to be impossible to maintain an official version of the subproject.
There are a few solutions to this problem, each with their own drawbacks.
First, the most obvious solution is to make the subproject into a package, and distribute it on a package manager like NPM or NuGet. This works very well for things that are not updated or maintained often, and can afford to be distributed to their consumers in discreet version numbers. However, if you’re changing this code on a regular basis, having to integrate, publish, and pull new versions of the project from a third party source simply does not work as well as having the code directly accessible. It also introduces complications for local development.
The other solution is to use a monorepo, one giant repository for all your code. This isn’t as crazy as you might think, and works well if all your code is in the same domain; Google uses a monorepo for all their code, and Microsoft uses one for all .NET assemblies they maintain. This solves the problem, because if you modify code in the subproject, it will be updated whenever you re-build. In Visual Studio, this can be done easily with Project References.
However, there are many cases where you’d want the best of both worlds—maintaining it centrally as a package, but also allowing direct embedding and editing in multiple projects. For this, Git Subtree provides a solution.
The core concept is pretty simple: you can have smaller Git repos, with their upstream linked to a sub-repository, but embedded in another project. All changes for Project 1, Project 2, and the Sub-Project are tracked on their own repositories.
Usually, Git is smart enough to handle pushing and merging automatically, depending on which changes came from which subtree. It’s good practice though to not mix commits between subtree code and main project code, as there are cases where you can run into a more complicated merge that requires you to use the underlying Git tools that
git subtree wraps.
Setting Up Git Subtree
If you just set up an empty project, and are going to set up subtrees, you’ll need to make an initial commit—even if it’s empty—or else Git will throw an error about an ambiguous HEAD. You can make an empty commit with the following command:
git commit --allow-empty -n -m "Initial commit."
You’ll want to add the remote for the subproject, and give it a name. You’ll use this name to refer to it:
git remote add -f SubTreeName https://github.com/user/project.git
Then, you can add the subtree, at the given prefix. Use the
--squash command so that the entire subproject history is not stored in the main project.
git subtree add --prefix .Path/To/SubTree SubTreeName master --squash
Using Git Subtree
When you need to pull, Git will only update the main project automatically, so you’ll have to fetch the remote, and then use a subtree-specific pull command. It’s a little lengthy, as you need to pass in the prefix, but gets the job done:
git fetch SubTreeName master
git subtree pull --prefix .Path/To/SubTree SubTreeName SubTreeName master --squash
Note that you can fetch commits from the remote, but pulling them into the subtree, or pushing them to the remote, requires you to use
git subtree specifically.
When it comes time to contribute back upstream, you’ll need to use
git subtree push:
git subtree push --prefix=.Path/To/SubTree SubTreeName master