Welcome to my blog. Have a look at the most recent posts below, or browse the tag cloud on the right. An archive of all posts is also available.

RSS Atom Add a new post titled:

Yesterday I pushed an implementation of a new OpenGL extension GL_EXT_shader_samples_identical to Mesa. This extension will be in the Mesa 11.1 release in a few short weeks, and it will be enabled on various Intel platforms:

  • GEN7 (Ivy Bridge, Baytrail, Haswell): Only currently effective in the fragment shader. More details below.
  • GEN8 (Broadwell, Cherry Trail, Braswell): Only currently effective in the vertex shader and fragment shader. More details below.
  • GEN9 (Skylake): Only currently effective in the vertex shader and fragment shader. More details below.

The extension hasn't yet been published in the official OpenGL extension registry, but I will take care of that before Mesa 11.1 is released.


Multisample anti-aliasing (MSAA) is a well known technique for reducing aliasing effects ("jaggies") in rendered images. The core idea is that the expensive part of generating a single pixel color happens once. The cheaper part of determining where that color exists in the pixel happens multiple times. For 2x MSAA this happens twice, for 4x MSAA this happens four times, etc. The computation cost is not increased by much, but the storage and memory bandwidth costs are increased linearly.

Some time ago, a clever person noticed that in areas where the whole pixel is covered by the triangle, all of the samples have exactly the same value. Furthermore, since most triangles are (much) bigger than a pixel, this is really common. From there it is trivial to apply some sort of simple data compression to the sample data, and all modern GPUs do this in some form. In addition to the surface that stores the data, there is a multisample control surface (MCS) that describes the compression.

On Intel GPUs, sample data is stored in n separate planes. For 4x MSAA, there are four planes. The MCS has a small table for each pixel that maps a sample to a plane. If the entry for sample 2 in the MCS is 0, then the data for sample 2 is stored in plane 0. The GPU automatically uses this to reduce bandwidth usage. When writing a pixel on the interior of a polygon (where all the samples have the same value), the MCS gets all zeros written, and the sample value is written only to plane 0.

This does add some complexity to the shader compiler. When a shader executes the texelFetch function, several things happen behind the scenes. First, an instruction is issued to read the MCS. Then a second instruction is executed to read the sample data. This second instruction uses the sample index and the result of the MCS read as inputs.

A simple shader like

    #version 150
    uniform sampler2DMS tex;
    uniform int samplePos;

    in vec2 coord;
    out vec4 frag_color;

    void main()
       frag_color = texelFetch(tex, ivec2(coord), samplePos);

generates this assembly

    pln(16)         g8<1>F          g7<0,1,0>F      g2<8,8,1>F      { align1 1H compacted };
    pln(16)         g10<1>F         g7.4<0,1,0>F    g2<8,8,1>F      { align1 1H compacted };
    mov(16)         g12<1>F         g6<0,1,0>F                      { align1 1H compacted };
    mov(16)         g16<1>D         g8<8,8,1>F                      { align1 1H compacted };
    mov(16)         g18<1>D         g10<8,8,1>F                     { align1 1H compacted };
    send(16)        g2<1>UW         g16<8,8,1>F
                                sampler ld_mcs SIMD16 Surface = 1 Sampler = 0 mlen 4 rlen 8 { align1 1H };
    mov(16)         g14<1>F         g2<8,8,1>F                      { align1 1H compacted };
    send(16)        g120<1>UW       g12<8,8,1>F
                                sampler ld2dms SIMD16 Surface = 1 Sampler = 0 mlen 8 rlen 8 { align1 1H };
    sendc(16)       null<1>UW       g120<8,8,1>F
                                render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT };

The ld_mcs instruction is the read from the MCS, and the ld2dms is the read from the multisample surface using the MCS data. If a shader reads multiple samples from the same location, the compiler will likely eliminate all but one of the ld_mcs instructions.

Modern GPUs also have an additional optimization. When an application clears a surface, some values are much more commonly used than others. Permutations of 0s and 1s are, by far, the most common. Bandwidth usage can further be reduced by taking advantage of this. With a single bit for each of red, green, blue, and alpha, only four bits are necessary to describe a clear color that contains only 0s and 1s. A special value could then be stored in the MCS for each sample that uses the fast-clear color. A clear operation that uses a fast-clear compatible color only has to modify the MCS.

All of this is well documented in the Programmer's Reference Manuals for Intel GPUs.

There's More

Information from the MCS can also help users of the multisample surface reduce memory bandwidth usage. Imagine a simple, straight forward shader that performs an MSAA resolve operation:

    #version 150
    uniform sampler2DMS tex;

    #define NUM_SAMPLES 4 // generate a different shader for each sample count

    in vec2 coord;
    out vec4 frag_color;

    void main()
        vec4 color = texelFetch(tex, ivec2(coord), 0);

        for (int i = 1; i < NUM_SAMPLES; i++)
            color += texelFetch(tex, ivec2(coord), i);

        frag_color = color / float(NUM_SAMPLES);

The problem should be obvious. On most pixels all of the samples will have the same color, but the shader still reads every sample. It's tempting to think the compiler should be able to fix this. In a very simple cases like this one, that may be possible, but such an optimization would be both challenging to implement and, likely, very easy to fool.

A better approach is to just make the data available to the shader, and that is where this extension comes in. A new function textureSamplesIdenticalEXT is added that allows the shader to detect the common case where all the samples have the same value. The new, optimized shader would be:

    #version 150
    #extension GL_EXT_shader_samples_identical: enable
    uniform sampler2DMS tex;

    #define NUM_SAMPLES 4 // generate a different shader for each sample count

    #if !defined GL_EXT_shader_samples_identical
    #define textureSamplesIdenticalEXT(t, c)  false

    in vec2 coord;
    out vec4 frag_color;

    void main()
        vec4 color = texelFetch(tex, ivec2(coord), 0);

        if (! textureSamplesIdenticalEXT(tex, ivec2(coord)) {
            for (int i = 1; i < NUM_SAMPLES; i++)
                color += texelFetch(tex, ivec2(coord), i);

            color /= float(NUM_SAMPLES);

        frag_color = color;

The intention is that this function be implemented by simply examining the MCS data. At least on Intel GPUs, if the MCS for a pixel is all 0s, then all the samples are the same. Since textureSamplesIdenticalEXT can reuse the MCS data read by the first texelFetch call, there are no extra reads from memory. There is just a single compare and conditional branch. These added instructions can be scheduled while waiting for the ld2dms instruction to read from memory (slow), so they are practically free.

It is also tempting to use textureSamplesIdenticalEXT in conjunction with anyInvocationsARB (from GL_ARB_shader_group_vote). Such a shader might look like:

    #version 430
    #extension GL_EXT_shader_samples_identical: require
    #extension GL_ARB_shader_group_vote: require
    uniform sampler2DMS tex;

    #define NUM_SAMPLES 4 // generate a different shader for each sample count

    in vec2 coord;
    out vec4 frag_color;

    void main()
        vec4 color = texelFetch(tex, ivec2(coord), 0);

        if (anyInvocationsARB(!textureSamplesIdenticalEXT(tex, ivec2(coord))) {
            for (int i = 1; i < NUM_SAMPLES; i++)
                color += texelFetch(tex, ivec2(coord), i);

            color /= float(NUM_SAMPLES);

        frag_color = color;

Whether or not using anyInvocationsARB improves performance is likely to be dependent on both the shader and the underlying GPU hardware. Currently Mesa does not support GL_ARB_shader_group_vote, so I don't have any data one way or the other.


The implementation of this extension that will ship with Mesa 11.1 has a three main caveats. Each of these will likely be resolved to some extent in future releases.

The extension is only effective on scalar shader units. This means on GEN7 it is effective in fragment shaders. On GEN8 and GEN9 it is only effective in vertex shaders and fragment shaders. It is supported in all shader stages, but in non-scalar stages textureSamplesIdenticalEXT always returns false. The implementation for the non-scalar stages is slightly different, and, on GEN9, the exact set of instructions depends on the number of samples. I didn't think it was likely that people would want to use this feature in a vertex shader or geometry shader, so I just didn't finish the implementation. This will almost certainly be resolved in Mesa 11.2.

The current implementation also returns a false negative for texels fully set to the fast-clear color. There are two problems with the fast-clear color. It uses a different value than the "all plane 0" case, and the size of the value depends on the number of samples. For 2x MSAA, the MCS read returns 0x000000ff, but for 8x MSAA it returns 0xffffffff.

The first problem means the compiler would needs to generate additional instructions to check for "all plane 0" or "all fast-clear color." This could hurt the performance of applications that either don't use a fast-clear color or, more likely, that later draw non-clear data to the entire surface. The second problem means the compiler would needs to do state-based recompiles when the number of samples changes.

In the end, we decided that "all plane 0" was by far the most common case, so we have ignored the "all fast-clear color" case for the time being. We are still collecting data from applications, and we're working on several uses of this functionality inside our driver. In future versions we may implement a heuristic to determine whether or not to check for the fast-clear color case.

As mentioned above, Mesa does not currently support GL_ARB_shader_group_vote. Applications that want to use textureSamplesIdenticalEXT on Mesa will need paths that do not use anyInvocationsARB for at least the time being.


As stated by issue #3, the extension still needs to gain SPIR-V support. This extension would be just as useful in Vulkan and OpenCL as it is in OpenGL.

At some point there is likely to be a follow-on extension that provides more MCS data to the shader in a more raw form. As stated in issue #2 and previously in this post, there are a few problems with providing raw MCS data. The biggest problem is how the data is returned. Each sample count needs a different amount of data. Current 8x MSAA surfaces have 32-bits (returned) per pixel. Current 16x MSAA MCS surfaces have 64-bits per pixel. Future 32x MSAA, should that ever exist, would need 192 bits. Additionally, there would need to be a set of new texelFetch functions that take both a sample index and the MCS data. This, again, has problems with variable data size.

Applications would also want to query things about the MCS values. How many times is plane 0 used? Which samples use plane 2? What is the highest plane used? There could be other useful queries. I can imagine that a high quality, high performance multisample resolve filter could want all of this information. Since the data changes based on the sample count and could change on future hardware, the future extension really should not directly expose the encoding of the MCS data. How should it provide the data? I'm expecting to write some demo applications and experiment with a bunch of different things. Obviously, this is an open area of research.

Posted Fri Nov 20 11:39:29 2015 Tags:

In the old days, there were not a lot of users of open-source 3D graphics drivers. Aside from a few ID Software games, tuxracer, and glxgears, there were basically no applications. As a result, we got almost no bug reports. It was a blessing and a curse.

Now days, thanks to Valve Software and Steam, there are more applications than you can shake a stick at. As a result, we get a lot of bug reports. It is a blessing and a curse.

There's quite a wide chasm between a good bug report and a bad bug report. Since we have a lot to do, the quality of the bug report can make a big difference in how quickly the bug gets fixed. A poorly written bug report may get ignored. A poorly written bug may require several rounds asking for more information or clarifications. A poorly written bug report may send a developer looking for the problem in the wrong place.

As someone who fields a lot of these reports for Mesa, here's what I look for...

  1. Submit against the right product and component. The component will determine the person (or mailing list) that initially receives notification about the bug. If you have a bug in the Radeon driver and only people working Nouveau get the bug e-mail, your bug probably won't get much attention.

    If you have experienced a bug in 3D rendering, you want to submit your bug against Mesa.

    Selecting the correct component can be tricky. It doesn't have to be 100% correct initially, but it should be close enough to get the attention of the right people. A good first approximation is to pick the component that matches the hardware you are running on. That means one of the following:

    If you're not sure what GPU is in your system, the command glxinfo | grep "OpenGL renderer string" will help.

  2. Correctly describe the software on your system. There is a field in bugzilla to select the version where the bug was encountered. This information is useful, but it is more important to get specific, detailed information in the bug report.

    • Provide the output of glxinfo | grep "OpenGL version string". There may be two lines of output. That's fine, and you should include both.

    • If you are using the drivers supplied by your distro, include the version of the distro package. On Fedora, this can be determined by rpm -q mesa-dri-drivers. Other distros use other commands. I don't know what they are, but Googling for "how do i determine the version of an installed package on " should help.

    • If you built the driver from source using a Mesa release tarball, include the full name of that tarball.

    • If you built the driver from source using the Mesa GIT repository, include SHA of the commit used.

    • Include the full version of the kernel using uname -r. This isn't always important, but it is sometimes helpful.

  3. Correctly describe the hardware in your system. There are a lot of variations of each GPU out there. Knowing exactly which variation exhibited bad behavior can help find the problem.

    • Provide the output of glxinfo | grep "OpenGL renderer string". As mentioned above, this is the hardware that Mesa thinks is in the system.

    • Provide the output of lspci -d ::0300 -nn. This provides the raw PCI information about the GPU. In some cases this maybe more detailed (or just plain different) than the output from glxinfo.

    • Provide the output of grep "model name" /proc/cpuinfo | head -1. It is sometimes useful to know the CPU in the system, and this provides information about the CPU model.

  4. Thoroughly describe the problem you observed. There are two goals. First, you want the person reading the bug report to know both what did happen and what you expected to happen. Second, you want the person reading the bug report to know how to reproduce the problem you observed. The person fielding the bug report may not have access to the application.

    • If the application was run from the command line, provide the full command line used to run the application.

      If the application is a test that was run from piglit, provide the command line used to run the test, not the command line used to run piglit. This information is in the piglit results.json result file.

    • If the application is game or other complex application, provide any necessary information needed to reproduce the problem. Did it happen at specific location in a level? Did it happen while performing a specific operation? Does it only happen when loading certain data files?

    • Provide output from the application. If the application is a test case, provide the text output from the test. If this output is, say, less than 5 lines, include it in the report. Otherwise provide it as an attachment. Output lines that just say "Fail" are not useful. It is usually possible to run tests in verbose mode that will provide helpful output.

      If the application is game or other complex application, try to provide a screenshot. If it is possible, providing a "good" and "bad" screenshot is even more useful. Bug #91617 is a good example of this.

      If the bug was a GPU or system hang, provide output from dmesg as an attachment to the bug. If the bug is not a GPU or system hang, this output is unlikely to be useful.

  5. Add annotations to the bug summary. Annotations are parts of the bug summary at the beginning enclosed in square brackets. The two most important ones are regression and bisected (see the next item for more information about bisected). If the bug being reported is a new failure after updating system software, it is (most likely) a regression. Adding [regression] to the beginning of the bug summary will help get it noticed! In this case you also must, at the very least, tell us what software version worked... with all the same details mentioned above.

    The other annotations used by the Intel team are short names for hardware versions. Unless you're part of Intel's driver development team or QA team, you don't really need to worry about these. The ones currently in use are:

  6. Bisect regressions. If you have encountered a regression and you are building a driver from source, please provide the results of git-bisect. We really just need the last part: the guilty commit. When this information is available, add bisected to the tag. This should look like [regression bisected].

    Note: If you're on a QA team somewhere, supplying the bisect is a practical requirement.

  7. Respond to the bug. There may be requests for more information. There may be requests to test patches. Either way, developers working on the bug need you to be responsive. I have closed a lot of bugs over the years because the original reporter didn't respond to a question for six months or more. Sometimes this is our fault because the question was asked months (or even years) after the bug was submitted.

    The general practice is to change the bug status to NEEDINFO when information or testing is requested. After providing the requested information, please change the status back to either NEW or ASSIGNED or whatever the previous state was.

    It is also general practice to leave bugs in the NEW state until a developer actually starts working on it. When someone starts work, they will change the "Assigned To" field and change the status to ASSIGNED. This is how we prevent multiple people from duplicating effort by working on the same bug.

    If the bug was encountered in a released version and the bug persists after updating to a new major version (e.g., from 10.6.2 to 11.0.1), it is probably worth adding a comment to the bug "Problem still occurs with ." That way we know you're alive and still care about the problem.

Here is a sample script that collects a lot of information request. It does not collect information about the driver version other than the glxinfo output. Since I don't know whether you're using the distro driver or building one from source, there really isn't any way that my script could do that. You'll want to customize it for your own needs.

echo "Software versions:"
echo -n "    "
uname -r
echo -n "    "
glxinfo 2>&1 | grep "OpenGL version string"

echo "GPU hardware:"
echo -n "    "
glxinfo 2>&1 | grep "OpenGL renderer string"
echo -n "    "
lspci -d ::0300 -nn

echo "CPU hardware:"
echo -n "    "
uname -m
echo -n "    "
grep "model name" /proc/cpuinfo | head -1 | sed 's/^[^:]*: //'

This generates nicely formatted output that you can copy-and-paste directly into a bug report. Here is the output from my system:

Software versions:
    OpenGL version string: 3.0 Mesa 11.1.0-devel

GPU hardware:
    OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 5500 (Broadwell GT2) 
    00:02.0 VGA compatible controller [0300]: Intel Corporation Broadwell-U Integrated Graphics [8086:1616] (rev 09)

CPU hardware:
    Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
Posted Wed Oct 14 09:18:47 2015 Tags:

I've been into retro-gaming for a couple years now, and, after the 30th anniversary of the Amiga, I've been getting into retro-computing too. To feed my dirty habbit, I've been doing quite a bit of garage sale hunting. Last weekend I went to a garage sale that advertised on craigslist as "HUGE HOARDER SALE". It did not disappoint. There were a bunch of Atari 7800 games in boxes, so that by itself was worth the trip.

I talked to the guy a bit. He told me there was more stuff inside, but he hadn't had a chance to go through it yet. He did want to, "sell an $80 game for $3." Fair enough. A little more chatting, and he offered to let me have a look. The games were almost all common stuff, but there was a Mega Man IV, in the box... I told him to look up a price for that one, and I still got a really good deal.

The rest of the house... OMG. It really was like you see on TV. Boxes covering almost every inch of the floor and stacked to the ceiling in places. The guy told me that he thought it would take about two years to get the place fully cleaned out. Ouch.

In the kitchen (obviously) there was a big, big pile of computers. I started looking through that... and there was some weird stuff there. The most interesting bits were an Osborne 1, a Mac SE/30, and a DaynaFile.

I first saw the DaynaFile from this view, and I thought, "An external SCSI chasis. I can use that."

I picked up, turned it around, and almost dropped it. First of all, that's a 5.25" floppy drive in a SCSI chasis. That earns some serious WTF marks alone. Look closer... 360k.

Now I had to have it. :) I added it to my pile. I figured that after amortizing the total price over all the things I got, I paid about $1 for it.

Upon getting it home, I dissected it. Inside is a perfectly ordinary 360k 5.25" floppy drive.

The whole thing is controlled by a little 8031 microcontroller that bridges the SCSI bus to the floppy bus.

As if the whole thing weren't crazy enough... there's the date on the ROM. It's hard to read in the picture, but it says, "DAYNAFILE REV. 3.1 @ 1989". Yes... 1989. Why? WHY?!? Why did someone need a 360k 5.25" SCSI floppy drive in 1989?!? By that time Macs, Amigas, Atari STs, and even most PCs had 720k 3.5" floppy drives standard in 1989. I understand wanting to read 1.2M 5.25" PC floppies or 1.44M/720k 3.5" PC floppies on a Mac, but 360k? For shame!

The bummer is that there's no powersupply for it. I found a user manual, which is filled with some serious lolz. '"Reading and writing files" is MS-DOS terminology.... Reading a file is the same as opening a document and writing a file is the same as closing a document and saving changes." Now I remember why I used to make fun of Macintosh users.

What the manual doesn't have anywhere in it's 122 pages is a pinout of the powersupply connector. The Internet says it uses an Elpac WM220... should be possible to rig something up.

Posted Thu Oct 8 18:27:18 2015

I just finished my talk at XDC 2014. The short version: UBO support in OpenGL drivers is terrible, and I have the test cases to prove it.

There are slides, a white paper, and, eventually, a video.

UPDATE: Fixed a typo in the white paper reported by Jonas Kulla on Twitter.

UPDATE: Direct link to video.

Posted Wed Oct 8 06:28:09 2014 Tags:

On Saturday I went to the Seattle Retro Gaming Expo. It was fun (as usual), and I spent too much money (also as usual). On Sunday I rolled up my sleeves, and I started working on a game for a retro system. I've been talking about doing this for at least a year, but it was hard to find time while I was still working on my masters degree. Now that I'm done with that, I have no excuses.

Since this is my first foray (back) into game programming on hella old hardware, I decided to stick with a system I know and a game I know. I'm doing Tetris for the TI-99/4a. Around 1997 I did a version of Tetris in 8051 assembly, and the TI is the first computer that I ever really programmed.

My stepdad used to work at Fred Meyer in the 80's, and he got the computer practically for free when TI discontinued them... and retailers heavily discounted their stock-on-hand. It was a few years until the "kids" were allowed to play with the computer. Once I got access, I spent basically all of the wee hours of the night hogging the TV and the computer. :) It was around that same time that my older stepsister swiped the tape drive, so I had no way to save any of my work. I would often leave it on for days while I was working on a program. As a result, I don't have any of those old programs. It's probably better this way... all of those programs were spaghetti BASIC garbage. :)

The cool thing (if that's the right phrase) about starting with the TI is that it uses the TMS9918 VDC. This same chip is used in the ColecoVision, MSX, and Sega SG-1000 (system before the system before the Sega Master System). All the tricks from the TI will directly apply to those other systems.

Fast forward to the present... This first attempt is also in BASIC, but I'm using TI Extended BASIC now. This has a few features that should make things less painful, but I'm pretty sure this will be the only thing I make using the actual TI as the development system... I'm basically writing code using ed (but worse), and I had repressed the memories of how terrible that keyboard is.

On Sunday, for the first time in 28 years, I saved and restored computer data on audio cassette.

Anyway... my plan is:

  1. Get Tetris working. Based on a couple hours hacking around on Sunday, I don't think BASIC is going to be fast enough for the game.
  2. Redo parts in assembly, which I can call from TI Extended BASIC, so that the game is playable.
  3. Maybe redo the whole thing in assembly... dunno.
  4. Move on to the next game.

I really want to do a version of Dragon Attack. The MSX has the same VDC, so it should be possible. Funny thing there... In 1990 I worked for HAL America (their office was on Cirrus Drive in Beaverton) as a game counselor. I mostly helped people with Adventures of Lolo and Adventures of Lolo 2. Hmm... maybe I should just port Lolo (which also started on the MSX) to the TI...

Posted Mon Jun 30 10:09:35 2014 Tags:

The slides and video of my talk from Steam Dev Days has been posted. It's basically the Haswell refresh (inside joke) of my SIGGRAPH talk from last year.

Posted Thu Feb 13 08:33:51 2014 Tags:

The slides from my FOSDEM talk are now available.

Posted Sat Feb 1 04:19:04 2014 Tags:

Multidimensional arrays are added to GLSL via either GL_ARB_arrays_of_arrays extension or GLSL 4.30. I've had a couple people tell me that the multidimensional array syntax is either wrong or just plain crazy. When viewed from the proper angle, it should actually be perfectly logical to any C / C++ programmer. I'd like to clear up a bit of the confusion.

Staring with the easy syntax, the following does what you expect:

    vec4 a[2][3][4];

If a is inside a uniform block, the memory layout would be the same as in C. cdecl would call this, "declare a as array 2 of array 3 of array 4 of vec4".

Using GLSL constructor syntax, the array can also be initialized:

    vec4 a[2][3][4] = vec4[][][](vec4[][](vec4[](vec4( 1), vec4( 2), vec4( 3), vec4( 4)),
                                          vec4[](vec4( 5), vec4( 6), vec4( 7), vec4( 8)),
                                          vec4[](vec4( 9), vec4(10), vec4(11), vec4(12))),
                                 vec4[][](vec4[](vec4(13), vec4(14), vec4(15), vec4(16)),
                                          vec4[](vec4(17), vec4(18), vec4(19), vec4(20)),
                                          vec4[](vec4(21), vec4(22), vec4(23), vec4(24))));

If that makes your eyes bleed, GL_ARB_shading_language_420pack and GLSL 4.20 add the ability to use C-style array and structure initializers. In that model, a can be initialized to the same values by:

    vec4 a[2][3][4] = {
            { vec4( 1), vec4( 2), vec4( 3), vec4( 4) },
            { vec4( 5), vec4( 6), vec4( 7), vec4( 8) },
            { vec4( 9), vec4(10), vec4(11), vec4(12) }
            { vec4(13), vec4(14), vec4(15), vec4(16) },
            { vec4(17), vec4(18), vec4(19), vec4(20) },
            { vec4(21), vec4(22), vec4(23), vec4(24) }

Functions can be declared that take multidimensional arrays as parameters. In the prototype, the name of the parameter can be present, or it can be omitted.

    void function_a(float a[4][5][6]);
    void function_b(float  [4][5][6]);

Other than the GLSL constructor syntax, there hasn't been any madness yet. However, recall that array sizes can be associated with the variable name or with the type. The prototype for function_a associates the size with the variable name, and the prototype for function_b associates the size with the type. Like GLSL constructor syntax, this has existed since GLSL 1.20.

Associating the array size with just the type, we can declare a (from above) as:

    vec4[2][3][4] a;

With multidimensional arrays, the sizes can be split among the two, and this is where it gets weird. We can also declare a as:

    vec4[3][4] a[2];

This declaration has the same layout as the previous two forms. This is usually where people say, "It's bigger on the inside!" Recall the cdecl description, "declare a as array 2 of array 3 of array 4 of vec4". If we add some parenthesis, "declare a as array 2 of (array 3 of array 4 of vec4)", and things seem a bit more clear.

GLSL ended up with this syntax for two reasons, and seeing those reasons should illuminate things. Without GL_ARB_arrays_of_arrays or GLSL 4.30, there are no multidimensional arrays, but the same affect can be achieved, very inconveniently, using structures containing arrays. In GLSL 4.20 and earlier, we could also declare a as:

    struct S1 {
        float a[4];

    struct S2 {
        S1 a[3];

    S2 a[2];

I'll spare you having to see GLSL constructor initializer for that mess. Note that we still end up with a[2] at the end.

Using typedef in C, we could also achieve the same result using:

    typedef float T[3][4];

    T a[2];

Again, we end up with a[2]. If cdecl could handle this (it doesn't grok typedef), it would say "declare a as array 2 of T", and "typedef T as array 3 of array 4 of float". We could substitue the description of T and, with parenthesis, get "declare a as array 2 of (array 3 of array 4 of float)".

Where this starts to present pain is that function_c has the same parameter type as function_a and function_b, but function_d does not.

    void function_c(float[5][6] a[4]);
    void function_d(float[5][6]  [4]);

However, the layout of parameter for function_e is the same as function_a and function_b, even though the actual type is different.

    struct S3 {
        float a[6];

    struct S4 {
        S3 a[5];

    void function_e(S4 [4]);

I think if we had it to do all over again, we may have disallowed the split syntax. That would remove the more annoying pitfalls and the confusion, but it would also remove some functionality. Most of the problems associated with the split are caught at compile-time, but some are not. The two obvious problems that remain are transposing array indices and incorrectly calculated uniform block layouts.

    layout(std140) uniform U {
        float [1][2][3] x;
        float y[1][2][3];
        float [1][2] z[3];

In this example x and y have the same memory organization, but z does not. I wouldn't want to try to debug that problem.

Posted Wed Jan 29 09:08:42 2014 Tags:

This previous weekend was the Portland Retro Gaming Expo, and it was awesome! I was there all day both days.

I got some really good deals. :) Gaiares (loose) for $5! Tatsujin (import version of Truxton) for $8! I got a lot of games. Just don't ask how much I spent altogether...

One of the highlights of the show where the performances by 8Bit Weapon. You'll notice in the second image that she's playing an Atari paddle controller. You can also see a C64 (donated by the local Commodore User Group) in the last image.

The GIANT arcade was... amazing! There were some games there that I haven't played in years. My uncle would have enjoyed Mappy. :) And I didn't win the tabletop Asteroids game. I spent a lot of time playing APB (Hey Ground Kontrol: put that in the arcade!!!). I was also pretty surprised that the only console version of APB is for the Lynx. Oof.

I love the Space Wars instructions... I think those are still the instructions for Windows... lol.

There was a cool presentation about making games for the Atari 2600. There were even a couple "celebrities" in the audience. On the right is Darrell Spice Jr. (programmer of Medieval Mayhem), and on the left is Rebecca Heineman (programmer London Blitz for the 2600 among other things). She's also responsible for the awesome disassembly of the game Freeway.

Joe Decuir was also in the audience, and I met him later at the Atari Age booth (I had to get Medieval Mayhem and Moon Cresta :) ). I can't believe Joe doesn't have a Wikipedia page. He was one of the original designers of the Atari 2600, programmer of Video Olympics and Combat (with Larry Wagner). He was also one of the original Amiga designers. He was having a conversation with one of the Atari Age guys about wanting to design a new console in the style of an old Atari 8-bit or an Amiga. I told him that there's a group trying to make an Amiga 2000 from scratch (using FPGAs for custom chips). So now I'm tasked with finding that project again and getting him in contact with them. I'm sure they'd appreciate any help he could offer!

The Tetris World Championships was so tense!!! I think I have to watch the movie now. After being down 2 games, Jonas came back to win it... for the fourth time in a row!

Posted Mon Oct 7 19:23:50 2013 Tags:

Now my Mesa: State of the Project talk is done too, and the slides are available.

I'm assuming that someone will send a long a link to the video soon... ;)

Posted Tue Sep 24 10:54:52 2013 Tags:

This wiki is powered by ikiwiki.