git pull gotchas

The git pull command seems convenient, but it actually does a bit much at once. It fetches from remote, attempts to merge into your local stuff, and, if successful, commits. If you’ve committed your work locally prior to the git pull, as recommended by the git documentation, the resulting commits are not as simple and digestible as one would hope. This piece, addressing novice and intermediate level git users, discusses the situation in some detail and recommends a combination of git fetch and git rebase.

Let’s assume you are an industrious developer who has just prepared a contribution to some project. You’re done, all unit tests are happy, and now you want your work to become available on some branch on origin, say master, so other people can see it.

Now what?

Other members of your team may have changed master. At this point in time, you don’t yet know. To find out, you can, of course, run git pull. If your teammates have not been working on master, or done only work that’s easily merge-able, you’re fine.

In the present scenario, we assume you have completed your changes locally, but have not committed them yet. Which means: You have no backup of your own work yet.

Git is afraid that it and you with combined forces might mess up your workspace during conflict resolution. So, if there is a conflict with uncommitted work, the “pre-merge checks” of the git merge part of git pull fail and nothing happens. No harm has been done, but no progress has been made, either.

Let’s take a step back and survey the situation. What’s up?

You certainly don’t want to risk a conflict resolution without a backup. So you want a commit first. This is also the approach recommended by the git merge documentation: It discourages merging on top of uncommitted changes.

So, the route to take is: Commit your local stuff first. After all, this is a version system. We should be using it to our advantage.

I see no point to not heed that advice always, as a habit. You’re done with a piece of work? Commit locally first thing, before even looking what the other people have done in the meantime. I do it that way, and, in my experience and judgment, this is a good habit to have.

So, with that in our mind, let’s start our little story one more time. Again, you’ve completed a job and the tests are happy.

This time, you commit your work first thing, on top of whatever dated material your local master happens to hold. You first want to safely tug your own stuff away. If you do that, you have a commit, and that commit is there to stay. You can always come back to it, no matter what. From this point on, your work is safe. After that’s accomplished, you have a sound basis for facing the merge work that’s ahead of you.

So, how do you go about that merge work? Maybe a git pull now?

Yes, you can use git pull now. But I argue that plain git pull is not what you want, once you’ve adopted the recommended “check in first” habit.

To see why, let’s assume independent changes come in from other members of your team. These are diligently merged by git pull. In many cases, the merge is done fully automatic. All is fine. – Or is it?

That merge which git pull has produced for you is – well, a merge. It’s a commit with two parents.

There is nothing inherently wrong with merge commits, commits with two parents (or, though rarely seen, even more than two). But those do add a cognitive burden. For yourself and your fellow workers, it just isn’t as easy to see what your commit is up to.

When you look at it via a git UI tool, that tool has two diffs it could show you. Depending on the tool, it might decide to show you none. Again: Unless you know how to specifically ask and use that knowledge[1], your git UI is likely to not show you anything about what happened in that merge commit.

In my experience, you do yourself and your team no favor if you use merge commits for trivial cases like this one.

Concrete example: This blog post was drafted the night after we cleaned up a botched merge, which needn’t have been a merge in the first place. Incidentally, should you have experienced problems with this innoQ homepage some time between Jan 25 in the evening through Jan 26, 2016, that botched merge commit most likely was the underlying cause. Initial repair attempts didn’t cut through to the root of the problem. In the end, some five innoQ developers teamed up to fully rectify the situation. Our repair work would have started earlier and progressed faster, had the botched two-parent merge commit been a plain normal single-parent commit instead.

Assuming you happen to be the kind of person wanting to learn from other people’s troubles: Use those merge commits only for more serious branch work.[2]

So, if not git pull, what else? Let me come back to our original scenario one last time. Work is done, tests are happy, now: What would I do? What do I actually do, in such a situation?

I initially add a commit with my work to whatever dated version of master I’ve been working on. This is a version system, I want to take advantage of that and make sure my work is safe. So far, so obvious (by now).

Next thing, I might simply try git push. If I’m lucky, nobody else has touched master and I’m finished. If I’m not, no harm is done, either.

Nice try, but most of the time, that push doesn’t work. In that case, I’d now run git fetch.[3]

That done, I now have my own commit on my personal master, as well as my team-mates' results in origin/master.

Now, I want to do merge work, but without actually producing a merge commit: git rebase origin/master does that trick for me.

What does that do? It grafts a copy of my commit on top of the new stuff on origin/master. That new commit copy has only one single parent, namely, the previous latest commit of origin/master. Locally, this commit also becomes the new checked-out HEAD of my local master which I’m on, with all the work from the other team members integrated in its history.

Admittedly, this emits a certain odor, as git rebase entails some amount of “rewriting of history”. The new commit pretends I’ve started my work on the basis of that previous latest commit from origin/master, while in fact I’ve started based on earlier commits.

But this is only a minor amount of “rewriting of history”. As presented here, I only manipulate my own local commits, which I have not yet shared with anyone. Such limited “rewriting of history” I consider quite tolerable.[4]

In the trivial situation, when the merge work can be done automatically, git rebase will leave me with a version of master that I can test one more time and then git push.

Should I face a merge conflict, I’ll have to resolve and commit manually. If I manage to do that, fine, final test and git push again and all is well.

Should I get the merge work wrong on first try, I can back out and try again. My original commit is still patiently waiting to see whether it’s still needed. I just have to dig up its SHA. To do so, I’ll just scroll up my terminal window, or else use gitk’s “view all refs”.

For my second attempt at the merge work, I want to reestablish my old local master to point to my commit. A straightforward way is to delete it and created it anew:

git checkout SHA-OF-MY-ORIGINAL-COMMIT
git branch -D master
git checkout -b master

Admittedly, that’s more robust than elegant. As a consequence of the re-creation, git does not yet know where to push the new local master, so I shall need the explicit git push -u origin master. Fortunately, this problem is self-healing: The connection to the remote origin is reestablished by this explicit push command.

But elegant or not, it works. I can retry my git rebase origin/master merge work as often as I require to get it right.

After the push, my colleagues will get to see a sole nice commit with a single parent, easily comprehensible. No uncalled-for cognitive burden here.

In conclusion, I want to emphasize that git is a version system which I, the developer, feel free to use locally as I please. I can produce as many commits on my local master or any other local branches as is convenient for me. I might use frowned-upon zero-information check-in comments such as “work in progress”. If I so desire, I might even do a commit just to keep (via the commit timestamp) a record of my departure time for lunch break.[5]

None of these commits need (or should) ever become visible via origin. When a particular piece of work is complete, I squash all related commits into one, using git rebase -i origin/master (either before or after git fetch, as I please). During that process, I also come up with an informative check-in comment for the whole thing. The end result is another sole shiny well-commented single-parent commit.

So, now you have my reasons for my reservations regarding git pull. In contrast, I consistently find git rebase to be my friend.

Those of you not wanting to resist the convenience lure of git pull, consider taming it with one of the --rebase options.

Basic command line git diff SHA1 SHA2 works well in this situation, and I highly recommend you have it in your toolbox, including the --summary version. ↩︎
When accepting a pull request, I like to document my review work by forcing a two-parent commit, even where a fast-forward would be possible. ↩︎
In case you care, my actual habit is to use git fetch -p. But the -p is irrelevant in the present context. ↩︎
If either you and your project disagree, you’ll probably have to live with double-parent merge commits and suffer the consequences. You may be able to reduce the number of such commits by taking the “feature branch” route. This helps somewhat, as long as only one person actively commits into each feature branch. ↩︎
No, even I haven’t actually done that. But you see my point. ↩︎

TAGS