I finally started working on ud-chains for Mesa's GLSL compiler. I'm continually amazed that it works as well as it does without them. There are so many ugly hacks and so much ad hoc data-flow analysis that I just couldn't put it off any longer. If you want to cry, take a look at how reaching definitions are determined in loop_analysis.cpp. It's pretty horrible.

My primary goal is to get enough infrastructure in place to fix that code. In addition to the ugliness, it fails on cases like:

void main()
{
    int n = 3;
    int i = 0;

    if (b)
        some_function();

    while (i < n) {
        some_other_function();
    }
}

The loop analyzer will have no idea what the values of i or n are. There are similar issues with identifying values that are constant across all iterations of the loop. In this case, flow control inside the loop confuses the analyzer.

The first step to this project is to get real basic block tracking. We have a mechanism to iterate over instructions as basic blocks (see ir_basic_block.cpp, but nothing is stored about the blocks. This means we don't know which blocks can transfer control to which other blocks.

I now have some code that creates a the basic block structure of a function. The next step will be to process the instructions in the basic blocks to track the (value) definitions within the block.

In the mean time, I modified the stand-alone compiler to optionally dump a graph of the basic blocks in dot. The results, even for small shaders, are shockingly cool.

Here's the code for the piglit test glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test:

uniform float one;
uniform int count1, count2;
uniform vec4 red;

void main()
{
    vec4 array[11];
    array[0] = vec4(0.1, 0,   0,   0) * one;
    array[1] = vec4(0,   0.1, 0,   0) * one;
    array[2] = vec4(0,   0,   0.1, 0) * one;
    array[3] = vec4(0,   0.1, 0,   0) * one;
    array[4] = vec4(0,   0  , 0.1, 0) * one;
    array[5] = vec4(0,   0,   0.1, 0) * one;
    array[6] = vec4(0.1, 0,   0,   0) * one;
    array[7] = vec4(0,   0.1, 0,   0) * one;
    array[8] = vec4(0,   0.1, 0.1, 0) * one;
    array[9] = vec4(0.1, 0.1, 0,   0) * one;
    array[10] = vec4(0.1, 0,   0.1, 0) * one;
    for (int i = 0; i < count1; i++)
        for (int j = 0; j < count2; j++)
            array[i*3+j] += red*float(j+1);
    gl_FragColor = array[0] + array[1] + array[2] + array[3] + array[4] +
        array[5] + array[6] + array[7] + array[8] + array[9] + array[10];
}

And here is the resulting basic block graph:

I guess I forgot to mention, all of this code is living in the ud-chains branch of my personal GIT repo: git://anongit.freedesktop.org/~idr/mesa.git
Comment by IanRomanick Tue 03 May 2011 05:22:15 PM PDT
Wasn't there some attempt to leverage LLVM for the GLSL compiler?
Comment by Anonymous Tue 03 May 2011 06:40:48 PM PDT
I really like the idea of dumping as dot file. This really helps me understanding the data flow
Comment by Christoph Brill Wed 04 May 2011 04:41:48 AM PDT
This stuff really looks interesting. Do you need any help on the GLSL compiler ? Have a TODO-list somewhere ?
Comment by Anonymous Wed 04 May 2011 10:37:49 AM PDT
UD chains? Isn't that kinda old? Shouldn't you be using SSA form instead?
Comment by Alexander Larsson Wed 04 May 2011 12:21:09 PM PDT
SSA and ud-chains solve different sets of problems. As far as I'm aware, "real" compilers tend to have both. I know that ICC has both. shrug
Comment by IanRomanick Wed 04 May 2011 02:22:52 PM PDT
We'd love some help, but I don't think we have a good todo list. I know that we want to have a simple intra-block CSE. For inter-block CSE, we'll eventually want SSA (as another poster pointed out). There is a high-level "missing feature" list, but I don't think that gives any guidance for where to start working on stuff.
Comment by IanRomanick Wed 04 May 2011 02:42:14 PM PDT
ssa

I think compilers only support both if they predate ssa and have old optimization steps that use the ud chains. ssa is not an optimization itself but rather a special form of the intermediate language where each variable is only assigned once. In such a form ud chains are trivial, def is at an assignment and use is any op that references that variable name.

My compiler theory is a bit rusty, but I'm fairly sure about this.

Comment by Alexander Larsson Thu 05 May 2011 12:03:49 AM PDT
ssa

I guess what I'm saying is, please don't add ud-chains and optimization passed based on it before implementing SSA, as then you'll have to rewrite those optimization passes when you switch to ssa form (and the implementation of the optimization passes are generally simpler when using ssa form).

Here is a book about SSA i found on google: http://ssabook.gforge.inria.fr/latest/book.pdf At least read some of it before deciding on using ud-chains.

Comment by Alexander Larsson Thu 05 May 2011 12:53:29 AM PDT
Just curious, how does your implementation of UD chains handle vector variables? Does access to any component(s) count as accessing the entire variable, or does it work per-component?
Comment by Bryan Cain Thu 05 May 2011 05:01:54 PM PDT
Vectors will definitely be per-component. Since matrices are (almost always) decomposed into vectors, this will cover them too. Initially structures and arrays will be for the whole variable. The ultimate goal is for this to be generic enough to be re-used with the lower-level IRs. This will allow us to fix some of the issues with the register allocator.
Comment by IanRomanick Fri 06 May 2011 01:23:44 PM PDT
people reimplementing LLVM from scratch.
Comment by robert Mon 16 May 2011 03:42:09 PM PDT