Git Clarity: Building Meaningful Commits and Linear History

Dieser Blogpost ist auch auf Deutsch verfügbar

Git is one of the staples of any software development workflow. Over the years, I have learned from experience how to best use Git to support the way that my brain thinks about code. I want to emphasize that my workflow is not the only right workflow and that it took me (a long) time to develop it and become comfortable with it. Git is an elegant solution to a complex problem (syncing and versioning code across multiple distributed systems). Git is not easy to learn and it is easy to make mistakes and get frustrated!

I also want to mention that a lot of the steps I use in my workflow can also be done using a Git client with a GUI instead of the command line! I actually used a GUI for years because I found it so much easier to interactively add code to a specific commit in the GUI. It wasn’t until a helpful colleague taught me the command line command (spoiler: git add -p) where I was able to move over and use the command line for everything I wanted to do.

Core Philosophy

Central to my workflow is the idea that each single commit should contain all of the changes necessary for a single task or feature AND that it should be easy for me to comprehend all of the changes at a given time. This is me being kind to my colleagues who will have to review my work – if I, having written the code, cannot comprehend it, it would be foolish of me to expect my colleagues to easily do so.

My Workflow: The Happy Path

git checkout -b ticket-nr_feature-name: begin a new feature branch for the work I want to do
Implement the feature and do all necessary work
git status: get an overview of the different files which have been modified
git add -p: interactively add changes by asking myself the question “does this change belong to the feature I am working on right now?”
git diff --staged: review the entirety of all changes one more time
git commit: bundle up the changes including a description and my rationale behind them. I will also use this same description for my merge request so that I only have to write it once.
git fetch: retrieves all changes from the remote repository
git rebase origin/main: will take the diff of all the changes that I made in my branch (which contains that single, self-contained commit) and reapplies them on the latest code
git publish (an alias in my .gitconfig: publish = "!git push -u origin $(git branch-name)"): to publish my feature branch to the remote repository

rebase vs. merge

Perhaps conspicuous in this list is the complete absence of any mention of pull or merge which many would consider synonymous with a Git workflow. This is by intention.

I intentionally choose not to use pull (which is a combination of fetch and merge) because I prefer to retrieve changes (fetch) and integrate changes in two separate steps. For integrating my changes, I have two different options: merge or rebase.

# Starting point (both branches diverge from common ancestor)

      A---B---C  (feature)
     /
D---E---F---G  (main)

# After git merge (creates a merge commit)

      A---B---C
     /         \
D---E---F---G---H  (main) ← H is a merge commit that combines C and G

# After git rebase (replays feature changes on top of main)

                  A'--B'--C'  (feature) ← Commits are replayed with new hashes
                 /
D---E---F---G  (main)

The main difference between the methods is that a merge will create an extra commit combining the changes from the two different branches (see diagram). In my view the main (and possibly only) benefit of this approach is that if you are merging two branches which both contain multiple commits with many different changes, you will only have to fix any conflicts once (in the merge commit).

However, in MY workflow I am focusing on creating a single comprehensible commit in each of my feature branches. When this is the case, I greatly prefer using rebase because with a single commit I also will only have to fix any conflicts once and I also achieve a beautiful linear git history which tells me the story of all of the changes in my code without any unnecessary “Merge branch into main” or “WIP – fix later” commit messages in between. My focus on keeping that commit small and comprehensible also helps reduce the possible merge conflicts (I’ve probably not diverged that much from the main branch) and make them easier to repair (because I can mentally comprehend all of the changes in my commit, it’s easier to reapply them in different contexts).

I’d like to note again that these are my personal preferences for my personal workflow. In a project, I am not dogmatic about enforcing well documented git commits and a linear git history. If the majority prefers to merge, I go with the flow (although I do personally always rebase my own branches onto main). But I have had the privilege of working in projects together with like-minded colleagues and must report that every single git log gave me actual joy.

The Not So Happy Path

The steps I listed before are the happy path of how I would develop a feature if everything goes exactly as planned. As you might imagine, in the real world this rarely happens.

What happens if my code has a merge conflict when I rebase it onto the code from main? As I’ve mentioned before, this is where we can benefit from the small commit size. Because the commit is small and it’s contents are comprehensible, it should be a lot easier to mentally figure out what needs to be done to repair the code and get it back in order.

Updating my Local Branch and Pushing It

What happens if my branch has become out-of-date while waiting for a code review and I want to update the code so that it can be easily merged back into main? Here I will rebase my feature branch onto main and then use git push --force-with-lease (with handy git please alias) to “force push” it onto the remote branch. I only force push branches that I created myself because if I force push a branch someone else is working on, they will probably have to pull out Git power tools like git reset, git cherry-pick or even possibly git reflog to get their code synchronized with main. The --force-with-lease parameter is a safeguard which checks if there are any changes to that branch on the remote server which I haven’t seen yet and which would be overwritten by force pushing.

Amending a Commit After the Fact

What happens if I have just packed all of my changes together into a beautifully documented commit, but I am still not happy with the result? In this case, I will just continue working! When I am finished tweaking and refining the code, I will add the new changes iteratively and then use git commit --amend to add them to the previous commit and tweak the message as needed.

Saving Progress to Switch Tasks

What happens if I need to take a break from coding and do something else (code review, meeting, etc.)? In this case, I usually create a WIP commit (git commit -m "WIP") and come back later. What this means in practice is that I often do have a branch with multiple commits that I then want to squash into a single commit when I am finished working on the task. I will usually look in the git log to find the first commit in my branch (I use my git alias git logf which resolves to git log --graph --pretty --oneline --all to print out the git history in a pretty graph resembling that of a GUI). Once I find the commit of that hash, I use git rebase -i <hash> to interactively rebase my commits onto that first commit. When using the interactive rebase, I most commonly select fixup (and occasionally reword for modifying the commit message) to squash everything into a single commit. When you read this, you may ask why I go to the trouble of finding the commit hash instead of just interactively rebasing onto origin/main. My reason is that I’ve gotten into the habbit of squashing my feature branch first before I rebase onto main because I prefer to perform the squashing first and then deal with any merge conflicts which may occur afterwards.

Here’s a tip I haven’t used much myself: a colleague taught that it is possible to use the git commit --fixup <hash> command directly instead of creating a WIP commit, because it is then possible to git rebase -i --autosquash and not have to sort through the git commits retrospectively. I think its an awesome idea, but haven’t needed it often enough to either memorize the syntax or create an alias for myself. Maybe someday.

A Pileup of Branches

One consequence of slicing my commits small enough that they can fit in my brain is that merge requests can be created more frequently. As I’ve mentioned before, I consider this consequence to be overwhelmingly positive. But one issue that may arise, is that it is possible that you may want to start on a new feature that requires the code from your first branch before someone else has been able to perform a proper code review of the first branch. What can you do then?

When this happens, I usually make a new branch in my git repository based off of my feature branch instead of main. In that new feature branch, I implement the new feature in a single commit again. When I create my merge request in GitLab, I create my merge request as a request to merge into my feature branch instead of main so that the review is only of the new code created for that feature. If the first feature branch is merged, the merge request will automatically switch to merge the second feature branch into main. This is a more compilicated situation and there is a greater risk that it will be more difficult to merge the branches. If the first branch needs changes or needs to be rebased, what do I do with the second? If rebasing is too difficult because the git history has changed too drastically, because the second branch also consists of a single commit, we can use git cherry-pick to pick that single commit and add it to a new branch that we can create based on main. This is definitely more advanced git gymnastics, but here we benefit yet again from the small git branch size.

The Power of Forgetting

One possible critique of this approach is that by postponing committing our changes until we have a piece of work that we are happy with, we will lose our work history and not be able to easily revert back to an earlier state. While I agree with the potential usefulness of this approach, in practice I find that there is a non-negligible mental overhead to saving every change in its own separate commit. If I do this, I then need to remember what my intention behind each change was as well as the reason why I decided to change it in a future version.

In reality I find it very powerful to intentionally forget everything outside of the actual task I am working on at that given moment. This gives me more mental bandwidth, and I find that if I do end up traversing a different path than originally planned and regretting the outcome, I then remember what the original approach was and can reimplement it. And I find the second time around the implementation is better: my brain has an extraordinary capacity to forget the parts of the implementation that weren’t so optimal and remember the good parts.

Conclusion

As I mentioned before, this is my approach to managing my personal workflow using Git. When it comes to actual implementation, I don’t think it is necessary for everybody to use a tool in exactly the same way: the important thing is that you find a way to optimize your tool use to work in accordance with how your brain works. Using rebase over merge can result in a lovely linear git history which is easier to read. Being willing to sacrifice the documentation of every single minute change can free up mental bandwidth to focus on the task at hand. And focusing on implementing small changes that can fit in a single commit (and in your brain) can make software development work much easier and more effective.

Blog Post