Bisecting like a boss

In a previous blog post I said, "If you have encountered a regression and you are building a driver from source, please provide the results of git-bisect." There is some feeling that performing a bisect is hard, time consuming, or both. Back in the bad-old-days, that was true... but git bisect run changed all that.

Most of performing a bisect is mechanical and repetitious:

Build the project.
If the build fails, run git bisect skip.
Run the test.
Inspect the results.
Run git bisect good or git bisect bad depending on the test result.
While there are more steps to bisect, repeat from step 1.
Run git bisect reset to restore the tree to its original state.

Some years ago, someone noticed that this seems like a task a computer could do. At least as early as git 1.6.0 (around 2010), this has been possible with using git bisect run. Once you get the hang of it, it's surprisingly easy to use.

A Word of Caution

Before actually discussing automated bisects, I want to offer a word of caution. Bisecting, automated or otherwise, is a process that still requires the application of common sense. A critical step at the end of bisecting is manually testing the guilty commit and the commit immediately before the guilty commit. You should also look at the commit that git-bisect claims is guilty. Over the years I have seen many bug reports for Mesa that either point at commits that only change comments in the code or only change a driver other than the driver on which the bug was observed.

I have observed these failures to have two causes, although other causes may be possible. With a manual bisect, it is really easy to accidentally git bisect good when you meant git bisect bad. I have done this by using up-arrow to go back through the shell command history when I'm too lazy to type the command again. It's really easy to get the wrong one doing that.

The other cause of failure I have observed occurs when multiple problems occur between the known-good commit and the known-bad commit. In this case the bisect will report a commit that was already know to be bad and was already fixed. This false information can waste a lot of time for the developer who is trying to fix the bug. They will spend time trying to determine why the commit reported by the bisect still causes problems when some other commit is the actual cause. The remedy is proper application of common sense while performing the bisect. It's often not enough to just see that the test case fails. The mode of the failure must also be considered.

Automated Bisect Basics

All of the magic in an automated bisect revolves around a single script that you supply. This script analyzes the commit and tells bisect what to do next. There are four things that the script can tell bisect, and each of them is communicated using the script's return code.

Skip this commit because it cannot be analyzed. This is identical to manually running git bisect skip.This can be used, for example, if the commit does not build. A script might contain something like:
```
if ! make ; then
    exit 125
fi
```
As you might infer from the code example, a return code of 125 instructs bisect to skip the commit.
Accept this commit as good. This is identical to git bisect good. A return code of 0 instructs bisect to accept the commit as good.
Reject this commit as bad. This is identical to git bisect bad. All tests in the piglit test suite print a string in a specific format when a test passes or fails. This can be used by the script to generate the exit code. For example:
```
bin/arb_clear_buffer_object-null-data -auto > /tmp/result.$$
if grep -q 'PIGLIT: {"result": "pass" }' /tmp/result.$$; then
    rm /tmp/result.$$
    exit 0
else
    cat /tmp/result.$$
    rm /tmp/result.$$
    exit 1
fi
```
In this bit of script, the output of the test is printed in the "bad" case. This can be very useful. Bisects of long ranges of commits may encounter failures unrelated to the failure you are trying to bisect. Seeing the output from the test may alert you to failures with unrelated causes.

Looking for simple "pass" or "fail" output from the test may not be good enough. It may be better to look for specific failure messages from the test. As mentioned above, it is important to only report a commit as bad if it the test fails due to the problem you are trying to bisect.

Imagine a case where a failure in the arb_clear_buffer_object-null-data on the master branch is being bisected. The particular failure is an incorrect error being generated, and the known-good commit is HEAD~200 when the last stable release occurred (on a different branch with a common root). However, HEAD~110..HEAD~90 contain an unrelated rendering error that was fixed in HEAD~89. Since git-bisect performs a binary search, it will test HEAD~100 first and see the wrong failure. Simply looking for test failure would incorrectly identify HEAD~110 as the problem commit. If the script instead checked for the specific incorrect error message, the correct guilty commit is more likely to be found.

A return code with any value of 1 through 127, excluding 125, instructs bisect to reject the commit as bad.
Stop at this commit and wait for human interaction. This can be used when something really catastrophic happens that requires human attention. Imagine a scenario where the bisect is being performed on one system but tests are run on another. This could be used if the bisect system is unable to communicate wit the test system. A return code with any value of 128 through 255 will halt the bisect.

All of this can be used to generate a variety of scripts for any sort of complex environment. To quote Alan Kay, "Simple things should be simple, complex things should be possible." For a simple make-based project and an appropriately written test case, an automated bisect script could be as simple as:

    #!/bin/bash
    if ! make ; then
        exit 125
    fi

    # Use the return code from the test case
    exec test_case

Since this is just a script that builds the project and runs a test, you can easily test the script. Testing the script is a very good idea if you plan to leave the computer when the bisect starts. It would be shame to leave the computer for several hours only to find it stuck at the first commit due to a problem in the automated bisect script. Assuming the script is called auto_bisect.sh, testing the script can be as easy as:

    $ ./auto_bisect.sh ; echo $?

Now all of the human interaction for the entire bisect would be three commands:

    $ git bisect start bad-commit good-commit
    $ git bisect run auto_bisect.sh
    $ git bisect reset

If there are a lot commits in good-commit..bad-commit, building the project takes a long time, or running the tests takes a long time, feel free to go have a sandwich while you wait. Or play Quake. Or do other work.

Broken Builds

The bane of most software developer's existence is a broken build. Few things are more irritating. With GIT, it is possible to have transient build failures that nobody notices. It's not unheard of for a 20 patch series to break at patch 9 and fix at patch 11. This commonly occurs either when people move patches around in a series during development or when reviewers suggest splitting large patches into smaller patches. In either case patch 9 could add a call to a function that isn't added until patch 11, for example. If nobody builds at patch 9 the break goes unnoticed.

The break goes unnoticed until a bisect hits exactly patch 9. If the problem being bisected and the build break are unrelated (and the build break is short lived), the normal skip process is sufficient. The range of commits that don't build will skip. Assuming the commit before the range of commits that don't build and the commit after the range of commits that don't build are both good or bad, the guilty commit will be found.

Sometimes things are not quite so rosy. You are bisecting because there was a problem, after all. Why have just one problem when you can have a whole mess of them? I believe that the glass is either empty or overflowing with steaming hot fail. The failing case might look something like:

    $ git bisect start HEAD HEAD~20
    Bisecting: 9 revisions left to test after this (roughly 3 steps)
    [2d712d35c57900fc0aa0f1455381de48cdda0073] gallium/radeon: move printing texture info into a separate function
    $ git bisect run ./auto_bisect.sh
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 9 revisions left to test after this (roughly 3 steps)
    [622186fbdf47e4c77aadba3e38567636ecbcccf5] mesa: errors: validate the length of null terminated string
    running ./auto_bisect.sh
        auto_bisect.sh says good
    Bisecting: 8 revisions left to test after this (roughly 3 steps)
    [19eaceb6edc6cd3a9ae878c89f9deb79afae4dd6] gallium/radeon: print more information about textures
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 8 revisions left to test after this (roughly 3 steps)
    [5294debfa4910e4259112ce3c6d5a8c1cd346ae9] automake: Fix typo in MSVC2008 compat flags.
    running ./auto_bisect.sh
        auto_bisect.sh says good
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7] gallium/radeon: print more info about CMASK
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [c60d49161e3496b9e64b99ecbbc7ec9a02b15a17] gallium/radeon: remove unused r600_texture::pitch_override
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [84fbb0aff98d6e90e4759bbe701c9484e569c869] gallium/radeon: rename fmask::pitch -> pitch_in_pixels
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [bfc14796b077444011c81f544ceec5d8592c5c77] radeonsi: fix occlusion queries on Fiji
    running ./auto_bisect.sh
        auto_bisect.sh says bad
    Bisecting: 5 revisions left to test after this (roughly 3 steps)
    [a0bfb2798d243a4685d6ea32e9a7091fcec74700] gallium/radeon: print more info about HTILE
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 5 revisions left to test after this (roughly 3 steps)
    [75d64698f0b0c906d611e69d9f8b118c35026efa] gallium/radeon: remove DBG_TEXMIP
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    Bisecting: 5 revisions left to test after this (roughly 3 steps)
    [3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007] radeonsi: print framebuffer info into ddebug logs
    running ./auto_bisect.sh
        auto_bisect.sh says bad
    Bisecting: 3 revisions left to test after this (roughly 2 steps)
    [a5055e2f86e698a35da850378cd2eaa128df978a] gallium/aux/util: Trivial, we already have format use it
    running ./auto_bisect.sh
        auto_bisect.sh says skip
    There are only 'skip'ped commits left to test.
    The first bad commit could be any of:
    19eaceb6edc6cd3a9ae878c89f9deb79afae4dd6
    2d712d35c57900fc0aa0f1455381de48cdda0073
    84fbb0aff98d6e90e4759bbe701c9484e569c869
    c60d49161e3496b9e64b99ecbbc7ec9a02b15a17
    1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7
    75d64698f0b0c906d611e69d9f8b118c35026efa
    a0bfb2798d243a4685d6ea32e9a7091fcec74700
    a5055e2f86e698a35da850378cd2eaa128df978a
    3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007
    We cannot bisect more!
    bisect run cannot continue any more

In even more extreme cases, the range of breaks can be even longer. Six or seven is about the most that I have personally experienced.

The problem doesn't have to be a broken build. It could be anything that prevents the test case from running. On Mesa I have experienced problems where a bug that prevents one driver from being able to load or create an OpenGL context persists for a few commits. Anything that prevents the test from running (e.g., not produce a pass or fail result) or causes additional, unrelated failures should be skipped.

Usually the problem is something really trivial. If the problem was fixed, a patch for the problem may even already exist. Let's assume a patch exists in a file named fix-the-build.patch. We also know that the build broke at commit 75d6469, and it was fixed at commit 3a6de8c. This means that the range 75d6469^..3a6de8c^ need the patch applied. If you're not convinced that the ^ is necessary, observe the log output:

    $ git log --oneline 75d6469^..3a6de8c^
    a0bfb27 gallium/radeon: print more info about HTILE
    1cca259 gallium/radeon: print more info about CMASK
    84fbb0a gallium/radeon: rename fmask::pitch -> pitch_in_pixels
    19eaceb gallium/radeon: print more information about textures
    2d712d3 gallium/radeon: move printing texture info into a separate function
    c60d491 gallium/radeon: remove unused r600_texture::pitch_override
    75d6469 gallium/radeon: remove DBG_TEXMIP

Notice that the bottom commit in the list is the commit where the break is first experienced, and the top commit in the list is not the one where the break is fixed.

Using this information is simple. The bisect script need only determine the current commit is in the list of commits that need the patch and conditionally apply the patch.

    # Get the short-from SHA of the current commit
    SHA=$(git log --oneline HEAD^.. | cut -f1 -d' ')

    # If the current commit is in the list of commits that need the patch
    # applied, do it.  If applying the patch fails, even partially, abort.
    if grep --silent "^$SHA " <(git log --oneline 75d6469^..3a6de8c^)
        #                     ^^                                    ^
        #                     This bit runs git-log, redirects the output
        #                     to a temporary file, then uses that temporary
        #                     file as the input to grep.  Non-bash shells
        #                     will probably need to do all that manually.
        if ! patch -p1 --forward --silent < fix-the-build.patch ; then
            exit 255
        fi
    fi

Before exiting, the script must return the tree to its original state. If it does not, applying the next commit may fail or applying the patch on the next step will certainly fail. git-reset can do most of the work. It just has to be applied everywhere this script might exit. I generally do this using a wrapper function. The simple bisect script from before might look like:

    #!/bin/bash
    function report()
    {
        git reset --hard HEAD
        exit $1
    }

    # Get the short-from SHA of the current commit
    SHA=$(git log --oneline HEAD^.. | cut -f1 -d' ')

    # If the current commit is in the list of commits that need the patch
    # applied, do it.  If applying the patch fails, even partially, abort.
    if grep --silent "^$SHA " <(git log --oneline 75d6469^..3a6de8c^)
        if ! patch -p1 --forward --silent < fix-the-build.patch ; then
            # Just exit here... so that we can see what went wrong
            exit 255
        fi
    fi

    if ! make ; then
        report 125
    fi

    # Use the return code from the test case
    test_case
    report $?

This can be extended to any number of patches to fix any number of problems.

There is one other tip here. If the first bisect attempt produced inconclusive results due to skipped commits, it may not have been wasted effort. Referring back to the previous output, there were two good commits found. These commits can be given to the next invocation of git bisect start. This helps reduce the search space from 9 to 6 in this case.

    $ git bisect start HEAD HEAD~20 622186fbdf47e4c77aadba3e38567636ecbcccf5 5294debfa4910e4259112ce3c6d5a8c1cd346ae9 
    Bisecting: 6 revisions left to test after this (roughly 3 steps)
    [1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7] gallium/radeon: print more info about CMASK

Using the last bad commit can reduce the search even further.

    $ git bisect start 3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007 HEAD~20 622186fbdf47e4c77aadba3e38567636ecbcccf5 5294debfa4910e4259112ce3c6d5a8c1cd346ae9 
    Bisecting: 4 revisions left to test after this (roughly 2 steps)
    [2d712d35c57900fc0aa0f1455381de48cdda0073] gallium/radeon: move printing texture info into a separate function

Note that git-bisect does not emit "good" or "bad" information. You have to author your bisect script to emit that information. The report function is a good place to do this.

    function report()
    {
        if [ $1 -eq 0 ]; then
            echo "    auto_bisect.sh says good"
        elif [ $1 -eq 125 ]; then
            echo "    auto_bisect.sh says skip"
        else
            echo "    auto_bisect.sh says bad"
        fi

        git reset --hard HEAD
        exit $1
    }

Remote Test Systems

Running tests on remote systems pose additional challenges. At the very least, there are three additional steps: get the built project on the remote system, start test on the remote system, and retrieve the result.

For these extra steps, rsync and ssh are powerful tools. There are numerous blog posts and tutorials dedicated to using rsync and ssh in various environments, and duplicating that effort is well beyond the scope of this post. However, there is one couple nice feature relative to automated bisects that is worth mentioning.

Recall that returning 255 from the script will cause the bisect to halt waiting for human intervention. It just so happens that ssh returns 255 when an error occurs. Otherwise it returns the result of the remote command. To make use of this, split the work across two scripts instead of putting all of the test in a single auto_bisect.sh script. A new local_bisect.sh contains all of the commands that run on the build / bisect system, and remote_bisect.sh contains all of the commands that run on the testing system.

remote_bisect.sh should (only) execute the test and exit with the same sort of return code as auto_bisect.sh would. local_bisect.sh should build the project, copy the build to the testing system, and start the test on the testing system. The return code from remote_bisect.sh should be directly returned from local_bisect.sh. A simple local_bisect.sh doesn't look too different from auto_bisect.sh:

    #!/bin/bash
    if ! make ; then
        exit 125
    fi

    if ! rsync build_results tester@test.system.com:build_results/; then
        exit 255
    fi

    # Use the return code from the test case
    exec ssh tester@test.system.com /path/to/test/remote_bisect.sh

Since remote_bisect.sh returns "normal" automated bisect return codes and ssh returns 255 on non-test failures, everything is taken care of.

Interactive Test Cases

Automated bisecting doesn't work out too well when the test itself cannot be automated. There is still some benefit to be had from automating the process. Optionally applying patches, building the project, sending files to remote systems, and starting the test can all still be automated, and I think "automated" applies only very loosely. When the test is done, the script should exit with a return code of 255. This will halt the bisect. Run git bisect good or git bisect bad. Then, run git bisect run ./auto_bisect.sh again.

It's tempting to just run auto_bisect.sh by hand and skip git bisect run. The small advantage to the later is that skipping build failures will still be automated.

Going further requires making an interactive test case be non-interactive. For developers of OpenGL drivers, it is common to need to bisect rendering errors in games. This can be really, really painful and tedious. Most of the pain comes from the task not being automatable. Just loading the game and getting to the place where the error occurs can often take several minutes. These bugs are often reported by end-users who last tested with the previous release. From the 11.0 branch point to the 11.1 branch point on Mesa there were 2,308 commits.

    $ git bisect start 11.1-branchpoint 11.0-branchpoint 
    Bisecting: 1153 revisions left to test after this (roughly 10 steps)
    [bf5f931aee35e8448a6560545d86deb35f0639b3] nir: make nir_instrs_equal() static

When you realized that bisect will be 10 steps with at least 10 to 15 minutes per step, you may begin to feel your insides die. It's even worse if you accidentally type git bisect good when you meant git bisect bad along the way.

This is a common problem testing interactive applications. A variety of tools exist to remove the interactivity from interactive applications. apitrace is one such tool. Using apitrace, the OpenGL commands from the application can be recorded. This step must be done manually. The resulting trace can then be run at a known good commit, and an image can be captured from the portion of the trace that would exhibit the bug. This step must also be done manually, but the image capture is performed by a command line option to the trace replay command. Now a script can reply the trace, collect a new image, and compare the new image with the old image. If the images match, the commit is good. Otherwise, the commit is bad. This can be error prone, so it's a good idea to keep all the images from the bisect. A human can then examine all the images after the bisect to make sure the right choice were made at each commit tested.

A full apitrace tutorial is beyond the scope of this post. The apitrace documentation has some additional information about using apitrace with automated bisects.

What Now?

git-bisect is an amazingly powerful tool. Some of the more powerful aspects of GIT get a bad rap for being hard to use correctly: make one mistake, and you'll have to re-clone your tree. With even the more powerful aspects of git-bisect, such as automated bisects, it's actually hard to go wrong. There are two absolutely critical final tips. First, remember that you're bisecting. If you start performing other GIT commands in the middle of the bisect, both you and your tree will get confused. Second, remember to reset your tree using git bisect reset when you're done. Without this step, you'll still be bisecting, so see the first tip. git-bisect and automated bisects really make simple things simple and complex things possible.

RSS Atom

thanks for this

I appreciate that you've taken the time to write up this walkthrough. I have written my own bisection routines for multi-system bisect on Mesa's CI, but I haven't taken the time to learn and use git-bisect directly.

I'm going to use this to set up a single-system bisection for the next long bisection that I have to investigate. The overhead of the CI is significant, and using git-bisect should speed up the process by an order of magnitude.

-Mark

Comment by Mark Janes — Tue Dec 8 14:21:22 2015