Have you ever been in a situation where you needed to find out which commit in
the history of a git repository was responsible for causing some change in
behaviour? It's a fairly common debugging technique, especially in larger, older
codebases. If you know where to find the code that might have introduced the
change you may be able to use commands like
git blame or
git log to search
through history for the offending commit, but often that is not the case, and
even when it is, there is often an easier approach. Enter
git bisect, a useful
git command that performs a binary search of history to identify the commit that
introduced a change.
Do you learn better with a more hands-on approach?
If a face-to-face approach to learning works better for you or your team, orangejellyfish run a popular git workshop which can take place remotely or on-site with you, giving you an opportunity to go deeper into some more advanced git concepts.
If that description alone is enough to put you off reading further, don't despair and follow this section to get a better understanding of what "binary search" is. If you've already got that concept down, feel free to jump ahead.
"Binary search" is an algorithm designed to find a target value in a sorted list of values. It works by comparing the target value to the middle element of the list. If they do not match, half of the list has been eliminated (because the list was already sorted) and the process repeats by comparing the target value to the new middle element, and so on, until the target value is found.
A visual representation might make it easier to follow. Here's our sorted list, or array, of numbers. Since this array contains numbers we can see that it in a sorted state because each element holds a value greater than the previous. Let's say that our target value is 42.
We start by comparing our target value to the middle element. We can see that the middle element is greater than our target:
Because the middle element is greater than the target we can discard the second half of the array and perform the test again. This time we can see that the middle element is smaller than our target:
Because the middle element is smaller than the target we can discard the first half of the array (bear in mind that we are now working with a new array, having already discarded half of the original) and perform the test one more time. Note that this time, since we have an even number of elements in the array, it is up to the algorithm whether we round up or down to determine the mid-point. In this example we have rounded up for the sake of brevity. This time we can see that the middle element matches the target:
Because the middle element matches the target the binary search is complete and the position of the target in the array is the result. In our case the target is at position 4. But how does any of this apply to git?
git bisect command allows us to perform a binary search on a range of
commits in a repository. Try to think of a range of commits as an array where
each element is an individual commit. This array is sorted in chronological (time
of commit) order, rather than numerical order as in the example above, but since
it is sorted we are able to perform a binary search on it. The target of the
search is the commit that introduced a given change.
In order to determine the relevant range of commits we need to know of a start
point at which the change we are trying to find had not yet been made and an end
poiont at which it had been. The end point will often be the
HEAD of the
current branch. The start point can be more difficult to determine but a good
strategy might be to check out an old but likely stable tag and go from there.
Let's assume that we know a bug has been introduced to our codebase sometime
v1.2.0 tag and now (the
HEAD of the master branch). Armed with
that information we can start the bisect and tell it which range we're working
$ git bisect start # Start the binary search process $ git bisect bad # Identify the end of the range $ git bisect good v1.2.0 # Identify the start of the range
Git now identifies the middle point of the range of commits, checks it out and pauses to give you the opportunity to work out whether or not the change you are trying to locate had been introduced at that point in time. You might do so by running an automated test suite, examining a file, or running your app and manually testing the relevant feature. Git tells you what has happened by printing something like this:
Bisecting: 64 revisions left to test after this (roughly 6 steps) [bf96dbd8768d94e935e0bb874b319ae9b65a9aaf] The mid-point commit message
Having determined whether or not the mid-point contains the change you are
searching for you can tell
git bisect and it will continue the process by
discarding one half of the range, checking out the new mid-point and pausing
again. Let's assume that the change we wanted to find is already present at this
point in time, meaning the current mid-point is "bad":
$ git bisect bad # Mark the current mid-point as bad, discard later commits Bisecting: 32 revisions left to test after this (roughly 5 steps) [e68d1eed5051744ad21bf7391b582939b5c996fc] New mid-point commit message
Git has paused again at the new mid-point so we get the opportunity to repeat our test. Let's assume this time that the change has not been introduced at this point in time, meaning this mid-point is "good":
$ git bisect good # Mark the current mid-point as good, discard earlier commits Bisecting: 12 revisions left to test after this (roughly 4 steps) [1ee2cc616963f57900d3bc2b635d9efbd1cd168d] New mid-point commit message
Continue this process, running your test at each step and marking the mid-point as either "good" or "bad" until git is able to definitively determine the commit that introduced the change. At that point it will helpfully tell you:
ebd6cd99667703a856b20c87c2307427006fcd2f is the first bad commit commit ebd6cd99667703a856b20c87c2307427006fcd2f Author: First Last <email@example.com> Date: Thu Aug 23 13:58:20 2018 +0000 This is the commit that introduced the issue
All that remains now is to get back to the normal state by telling git that you are done:
$ git bisect reset
What we've covered so far is a powerful addition to your git debugging toolbox in its own right, but we can take it a step further to cut down some of the manual work.
The part of the previous example that would slow us down in reality is the test
to determine whether each mid-point commit exhibits the behaviour we are trying
to locate or not. If it's possible to script that test then git is more than
happy to take that burden off us. What we need is a script that exits with code
0 if the current commit is "good" and with
1 (the Unix catch-all error code)
if it's "bad". Most automated test runners will already do this so in many cases
you'll be able to use an existing script rather than writing a new one.
Let's assume you're trying to track down a bug in a codebase that uses Jest as a unit test runner. The Jest CLI tool exits with helpful codes so we're able to use it to automate a git bisect:
$ git bisect start # Start the binary search process $ git bisect bad # Identify the end of the range $ git bisect good v1.2.0 # Identify the start of the range $ git bisect run jest # Run unit tests against each mid-point
There are some additional tricks you can use in your test script. For example,
exiting the script with exit code
125 will tell
git bisect that the commit
cannot be tested and should be skipped. In that situation the command will
choose another commit for you and continue automatically.
At this point you should have gained a solid understanding of
git bisect and
be in a position to use it either manually or automatically to discover problems
that were introduced by an unknown commit. If you'd like to take it further a
good place to start is the official git documentation.
If a more hands-on approach works better for you or your team, orangejellyfish run a popular git workshop which can take place remotely or on-site with you, giving you an opportunity to go deeper into some more advanced git concepts.