Merge a subdirectory of another repository with git

posted on 2013-08-23

There are some quick write-ups on this on the web, but because it actually took me a stackoverflow question to end up with a github help page to get it to work.

Here is me, doing yet another write-up of: merging a single directory of files from a remote repository into your current repository without loosing the history of commits on those files (otherwise a git checkout would be enough).

Set up a test environment

Let's start with a simple test directory to allow us to play around:

rm -rf /tmp/merge
mkdir /tmp/merge
cd /tmp/merge
git clone https://github.com/bneijt/autotrash.git
git clone https://github.com/bneijt/ccbuild.git

Doing the merge

Now let's merge ccbuild/src directory into the autotrash repository. First, let git know the other repo exists and what to name it:

cd /tmp/merge/autotrash
git remote add ccbuild ../ccbuild

Note that we use the local file system as the remote, because we already have the files here, which makes everything faster.

Now fetch the information of our remote into the local .git directory

git fetch ccbuild

Set up a merge point between the two repositories, but don't perform the commit. Git will simulate the merge failed, meaning that we can then use the next command to copy only the right commits onto the merge we started:

git merge -s ours --no-commit ccbuild/master

Now read the commits about src/ from ccbuild/master and place the files resulting from them under ccbuildsrc:

git read-tree --prefix=ccbuildsrc -u ccbuild/master:src

Your working tree should now have a ccbuildsrc directory. But the merge commit has not been done yet, the changes have only been read onto the current tree. Now we have to finalize with a merge commit:

git commit

The default commit message should mention "Merge remote-tracking...". If it does not, you probably forgot to mention you started a commit using the git merge command above.

All seems well, but if you look in your git history, you will see the full commit log of the remote files and you will be able to see unrelated commits in the history. This should not really be a problem, but as the other history can be quite large, let's see how git will allow use to solve this. Even though rewriting of history is almost always an optimization, it may a nice thing to do here.

Filter before merging

I've tried getting the sub-tree, which should make this all look much nicer, but even

sudo apt-add-repository ppa:git-core/ppa
sudo apt-get update
sudo apt-get install git

Did not include the subtree command because it's still in contrib and not the main git reference. So here is the old approach. If you want to use the new approach, create a branch with git subtree and use that branch instead of master in the above example.

If you are not sure what you did earlier and don't know how to reset it all, create a new test environment as above.

We start by destroying the master branch on the ccbuild project, by removing everything but the files in the subdirectory and creating a new history:

cd /tmp/merge/ccbuild
git filter-branch --prune-empty --subdirectory-filter src --

Now all the files from the src directory are in the top level directory of the ccbuild repository. This means we no longer want to pull in the subdirectory, but the root into the prefix

cd /tmp/merge/autotrash
git remote add ccbuild ../ccbuild
git fetch ccbuild
git merge -s ours --no-commit ccbuild/master
git read-tree --prefix=ccbuildsrc -u ccbuild/master
git commit

That's it. Then share your glory with a git push.

Please note: never ever should you have to --force anywhere in this process.