Get good with git: bisect

Have you ever been in a situation where you needed to find out which commit in the history of a git repository was responsible for causing some change in behaviour? It's a fairly common debugging technique, especially in larger, older codebases. If you know where to find the code that might have introduced the change you may be able to use commands like git blame or git log to search through history for the offending commit, but often that is not the case, and even when it is, there is often an easier approach. Enter git bisect, a useful git command that performs a binary search of history to identify the commit that introduced a change.

Do you learn better with a more hands-on approach?

If a face-to-face approach to learning works better for you or your team, orangejellyfish run a popular git workshop which can take place remotely or on-site with you, giving you an opportunity to go deeper into some more advanced git concepts.

If that description alone is enough to put you off reading further, don't despair and follow this section to get a better understanding of what "binary search" is. If you've already got that concept down, feel free to jump ahead.

"Binary search" is an algorithm designed to find a target value in a sorted list of values. It works by comparing the target value to the middle element of the list. If they do not match, half of the list has been eliminated (because the list was already sorted) and the process repeats by comparing the target value to the new middle element, and so on, until the target value is found.

A visual representation might make it easier to follow. Here's our sorted list, or array, of numbers. Since this array contains numbers we can see that it in a sorted state because each element holds a value greater than the previous. Let's say that our target value is 42.

We start by comparing our target value to the middle element. We can see that the middle element is greater than our target:

Comparing to the target value

Because the middle element is greater than the target we can discard the second half of the array and perform the test again. This time we can see that the middle element is smaller than our target:

Discarding half of the array

Because the middle element is smaller than the target we can discard the first half of the array (bear in mind that we are now working with a new array, having already discarded half of the original) and perform the test one more time. Note that this time, since we have an even number of elements in the array, it is up to the algorithm whether we round up or down to determine the mid-point. In this example we have rounded up for the sake of brevity. This time we can see that the middle element matches the target:

Discarding another half of the array

Because the middle element matches the target the binary search is complete and the position of the target in the array is the result. In our case the target is at position 4. But how does any of this apply to git?

git bisect basics

The git bisect command allows us to perform a binary search on a range of commits in a repository. Try to think of a range of commits as an array where each element is an individual commit. This array is sorted in chronological (time of commit) order, rather than numerical order as in the example above, but since it is sorted we are able to perform a binary search on it. The target of the search is the commit that introduced a given change.

In order to determine the relevant range of commits we need to know of a start point at which the change we are trying to find had not yet been made and an end poiont at which it had been. The end point will often be the HEAD of the current branch. The start point can be more difficult to determine but a good strategy might be to check out an old but likely stable tag and go from there.

Let's assume that we know a bug has been introduced to our codebase sometime between the v1.2.0 tag and now (the HEAD of the master branch). Armed with that information we can start the bisect and tell it which range we're working with:

$ git bisect start          # Start the binary search process
$ git bisect bad            # Identify the end of the range
$ git bisect good v1.2.0    # Identify the start of the range

Git now identifies the middle point of the range of commits, checks it out and pauses to give you the opportunity to work out whether or not the change you are trying to locate had been introduced at that point in time. You might do so by running an automated test suite, examining a file, or running your app and manually testing the relevant feature. Git tells you what has happened by printing something like this:

Bisecting: 64 revisions left to test after this (roughly 6 steps)
[bf96dbd8768d94e935e0bb874b319ae9b65a9aaf] The mid-point commit message

Having determined whether or not the mid-point contains the change you are searching for you can tell git bisect and it will continue the process by discarding one half of the range, checking out the new mid-point and pausing again. Let's assume that the change we wanted to find is already present at this point in time, meaning the current mid-point is "bad":

$ git bisect bad    # Mark the current mid-point as bad, discard later commits
Bisecting: 32 revisions left to test after this (roughly 5 steps)
[e68d1eed5051744ad21bf7391b582939b5c996fc] New mid-point commit message

Git has paused again at the new mid-point so we get the opportunity to repeat our test. Let's assume this time that the change has not been introduced at this point in time, meaning this mid-point is "good":

$ git bisect good    # Mark the current mid-point as good, discard earlier commits
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[1ee2cc616963f57900d3bc2b635d9efbd1cd168d] New mid-point commit message

Continue this process, running your test at each step and marking the mid-point as either "good" or "bad" until git is able to definitively determine the commit that introduced the change. At that point it will helpfully tell you:

ebd6cd99667703a856b20c87c2307427006fcd2f is the first bad commit
commit ebd6cd99667703a856b20c87c2307427006fcd2f
Author: First Last <first.last@example.com>
Date:   Thu Aug 23 13:58:20 2018 +0000

    This is the commit that introduced the issue

All that remains now is to get back to the normal state by telling git that you are done:

$ git bisect reset

Automating git bisect

What we've covered so far is a powerful addition to your git debugging toolbox in its own right, but we can take it a step further to cut down some of the manual work.

The part of the previous example that would slow us down in reality is the test to determine whether each mid-point commit exhibits the behaviour we are trying to locate or not. If it's possible to script that test then git is more than happy to take that burden off us. What we need is a script that exits with code 0 if the current commit is "good" and with 1 (the Unix catch-all error code) if it's "bad". Most automated test runners will already do this so in many cases you'll be able to use an existing script rather than writing a new one.

Let's assume you're trying to track down a bug in a codebase that uses Jest as a unit test runner. The Jest CLI tool exits with helpful codes so we're able to use it to automate a git bisect:

$ git bisect start          # Start the binary search process
$ git bisect bad            # Identify the end of the range
$ git bisect good v1.2.0    # Identify the start of the range
$ git bisect run jest       # Run unit tests against each mid-point

There are some additional tricks you can use in your test script. For example, exiting the script with exit code 125 will tell git bisect that the commit cannot be tested and should be skipped. In that situation the command will choose another commit for you and continue automatically.

What next?

At this point you should have gained a solid understanding of git bisect and be in a position to use it either manually or automatically to discover problems that were introduced by an unknown commit. If you'd like to take it further a good place to start is the official git documentation.

If a more hands-on approach works better for you or your team, orangejellyfish run a popular git workshop which can take place remotely or on-site with you, giving you an opportunity to go deeper into some more advanced git concepts.

Twitter