Published : 2018-07-12

Extract a subfolder as a new repository

In some modular software architectures, it may be interesting to extract one of the modules to make it independent from the parent project.

Rather than blindly copying the source code into a new repository and losing a usually valuable history, we can use git to preserve this history and clean it up.

We will here extract the subfolder1 directory of the project into a new repository.

Let’s start from the source project and copy it aside, into a workspace for the new project.

cp -R /home/dev/myproject /home/dev/newmodule

Move into the module directory and purge the remote, as it currently points to the local source directory.

cd /home/dev/newmodule

We purge the tags that don’t have subfolder1 at their root.

# Purge non related tags
for T in $(git tag -l); do
  git show ${T}:subfolder1>/dev/null;
  if [ $? -ne 0 ]; then
    echo "No subfolder1 in ${T}" && git tag -d ${T};
  fi;
done

We now rewrite the git history to keep only the subfolder1 subdirectory and bring it back to the root, for all branches.

git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter subfolder1 -- --all

Then we clean up the git history by realigning the references.

git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

Finally we clean up the workspace:

git reflog expire --expire=now --all
git gc --aggressive --prune=now

And we push the information to the remote repository.

git remote add newrepo git@gitlab.com:myself/newmodule.git
git push newrepo master
git push newrepo --tags
for B in $(git branch -r | grep -v newrepo | grep -v tags); do
  git push newrepo +${B}:refs/heads/${B}
done

The last loop is interesting because it allows to push the updated branches coming from the origin remote to the newrepo remote without performing a checkout.