There are many motivations for wanting to rewrite Git history:
- Fixing up an old commit
- Reducing repository size by removing large files previously checked into the repo - they still take up disk space even if removed from later commits
- Removing sensitive data (passwords, credentials, etc) which you wouldn't want to share when sharing the repo, or even just moving a repo to an external hosting provider
- Re-arranging or removing entire folders to reflect the splitting up of many sub-projects from one repo to many, or vice-versa
Git repositories store the whole of project history, and by design Git commits are immutable (their id changes completely if even a small part of their content or history is changed) - so how can you remove unwanted data stored deep in your repository’s past?
Tools for rewriting history
Changing history in a Git repository means rewriting all of the subsequent commit history from that point. There are several tools available to let you do this:
git commit --amend- for fixing the most recent commit you just made.
git rebaseto rebase a branch's history, replaying it to look as though it was all based on a different (often newer) point in the repository's history. Used with the
-iflag, can be used to interactively re-order history.
git filter-branch- an automated tool to rewrite many commits (on many branches) using one or more shell-scripts to make the alterations, which gives it great flexibility.
- The BFG Repo-Cleaner - an alternative to
git-filter-branchwhich achieves greater speed & usability by restricting itself to common use-cases around the task of removing unwanted data.
Sharing the rewritten history
If a repository has been shared prior to the rewrite, it's necessary to afterwards push the rewritten version to the main server with a
--mirror flag, and then request other users to re-clone the repository.