Welcome to my blog. Have a look at the most recent posts below, or browse the tag cloud on the right. An archive of all posts is also available.

RSS Atom Add a new post titled:

Just for giggles, watch an episode of Grimm, then watch and episode of Portlandia. For more fun, take a shot of [beverage of choice] each time you see a location in one show that you saw in the other.

Posted Mon Jan 9 18:14:09 2012 Tags:

After planning my first LAN party in over a year, I discovered that the game I planned to play, Battlefield: Bad Company 2, intentionally does not include support for LAN-only servers. In over 15 years of hosting LAN parties, I have never encountered a game that does this.

EA made this choice, supposedtly, to curb piracy. What they have done is guarantee that I will never buy one of their products again. Ever.

Strong work.

Posted Fri Dec 16 12:32:04 2011 Tags:

In the wake of Occupy Portland being shut down, it's good to reflect on why it failed: 1% ruined it for 99%.

Posted Mon Nov 14 10:07:51 2011 Tags:

GIT is a wonderful tool. It enables some really powerful, flexible working models. There is one model that is commonly used when developing a major feature (or large refactor):

  • Create a long series of patches in the order that it makes sense to develop them.

  • Rebase, reorder, and merge patches into the order that it makes sense to review them.

  • Post patches for review in small, easy to digest groups.

There is a danger in this process. While reordering patches it is very easy to create a tree that won't build at intermediate commits. Since the entire group of patches gets pushed at once, this may not seem like a problem. However, having intermediate commits not build makes it impossible to bisect when failures are found later.

Having every commit along a long patch series build is an absolute requirement.

However, it's kind of a pain in the ass to manually build the tree at every step. The emphasis in the previous sentence is manually. In some sense the ideal tool is something like a scripted bisect. This doesn't work, of course, because both ends of the commit sequence, in bisect terminology, are good.

To get around this in Mesa, I wrote a script that builds origin/master and every commit in some range of SHA1s. If any build fails, the script halts. At the end, it logs the change in the compiler warnings from origin/master to each commit (as I write this, it occurs to me that the warning diffs should be between sequential commits). This is useful to prevent new warnings from creeping in.

My script is available. This particular script is very specific to Mesa, but it should be easy to adapt to other projects. The script is run as:

check_all_commits.sh /path/to/source /path/to/build firstSHA..lastSHA
Posted Wed Oct 26 15:35:41 2011 Tags:

(Better never than late... or something...)

I had high expectations for Hiding Complexity, but I was largely disappointed. I only stayed for the first two talks. As far as I could tell, Occlusion Culling in Alan Wake summarized the combination of a bunch of known techniques for occlusion. Their combination seems to have the strong disadvantage that it has a lot of temporal instability, and this can lead to frame rate jitter. Even the bit of the talk about optimizing shadow generation seemed like a subset of the optimizations from CC Shadow Volumes. I'll have to look at their I3D paper, Shadow Caster Culling for Efficient Shadow Mapping to be sure.

Increasing Scene Complexity: Distributed Vectorized View Culling can be summarized by: we optimized a slow part of our code in all the obvious ways, and it got a lot faster. shock.

After that, I bailed and went to Compiler Techniques for Rendering (slides are also available.

I missed the first couple talks, but I did arrive in time for AnySL. They have an interesting project. It's essentially an N:M compiler. It seems like we should connect their OpenCL work with various open-source efforts that are underway. This is especially the case if their claim of beating Intel's (CPU) OpenCL compiler on almost all kernels turns out to be true.

Automatic Bounding of Shaders for Efficient Global Illumination was interesting, but I don't think it has anything directly applicable to our GLSL compiler work. However, it did give me some ideas to try for a real-time light integrator that I've been working on.

Compilation for GPU Accelerated Ray Tracing in OptiX looked mostly like their talk from SIGGRAPH last year. I didn't recall any mention of the previous RTSL work, and that paper was an interesting read. There are a couple built-in functions in that language that are useful. I've open-coded a couple of them in the past...

The final session of the conference was Real-Time Rendering Hardware.

Clipless Dual-Space Bounds for Faster Stochastic Rasterization and Decoupled Sampling for Graphics Pipelines were related pieces of work. Each was a different part of solving the automatic defocus and motion blur problem. I like the idea of having the hardware assist with these in a similar way that it helps with antialiasing. While it's trivial to expose MSAA to the developer (it's mostly transparent), it's not clear to me how to expose motion blur or, to a lesser degree, defocus blur. Given all the weird ways that people implement animation in real-time systems, how can the API directly expose a time dimension as a shader parameter?

Spark: Modular, Composable Shaders for Graphics Hardware echoes a lot of concerns that I've had for a few years. That people have to machine generate 10,000 shaders or use #define madness to specialize 10,000 variations of their shaders shows that we haven't given them a useful system. Even without that it's pretty much impossible to have separation of concerns in a shader stack. They way that OSL allows shaders to be composed solves some of this, and Spark takes a different approach.

Physically Based Real-Time Lens Flare Rendering goes in the "I should try to implement that" bin. Bastards! It was nice ending the conference with pretty pictures.

Posted Thu Aug 18 10:04:41 2011 Tags:

I went to Surfaces just to hear LR: Compact Connectivity Representation for Triangle Meshes. Their technique is pretty simple, and achieves really good results. Their primary insight was that a lot of mesh connectivvity can be stored implicitly by the ordering of the verticies. If all of your vertex data is a (non-indexed) strip, you get a lot of information for free. Like most mesh data structures, they only handle manifold meshes. The big disadvantage is that LR doesn't "yet" handle updates, but that's an area of planned future research.

I also tried to get over to the last bits of Mixed Grill, but I only managed to catch High-Resolution Relightable Buildings From Photographs. Their technique is really a novel combination of several known techniques. Their results are pretty impressive.

The last session before lunch was Image Processing. I was primarily interested in Antialiasing Recovery, and it was a really well given talk. You can tell that a speaker knows their audience when he starts with, "I understand that there are people like me that like to take naps during talks, so if you want to get the gist of the paper, pay attention to these parts." Strong work.

They take an image before and after some filter operation (e.g., thresholding to black and white) and try to recover antialiasing (smooth edges, really) in the prefiltered image. This is acomplished by examining both images for hard edges and doing neighbor searches and blending in the areas where new hard edges appear. On current limitation is the assumption that only two colors from a pixel's 8 neighbors contribute to the original smooth edge. They also plan to have code available on their project page in a month or so.

Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid was a clever techniqe. I'll probably try to play around with it a bit. There are a few areas in real-time rendering (e.g., SSAO, some soft-shadow algorithms) where bilateral filtering is used as an edge-aware filter. Most of these are low-frequency effects where some of the noise isn't as apparent, but there may still be some benefit.

The remaining papers, Domain Transform for Edge-Aware Image and Video Processing and Non-Rigid Dense Correspondence With Applications for Image Enhancement, produce really good results, but they're both in area way outside my area of expertise.

I spent the afternoon in the expo hall and at the Animation Theater "demoscene" session. Hurray for seeing CNCD / Fairlight on the big screen. Things have come a long way since my scene days...

After that, I went to the OpenGL BoF. Hurray for OpenGL 4.2!

Posted Fri Aug 12 10:09:18 2011 Tags:

I started the day with Out of Core, but I missed Google Body: 3D Human Anatomy in the Browser.

Over the last few years there have been a lot of papers about real-time global illumination. They always start with two caveats: no specular and one bounce. Interactive Indirect Illumination Using Voxel Cone Tracing breaks the mold and goes for two bounces! Hurray! They can also handle long-range indirect illumination better than LPVs, and it runs slightly faster. Based on their 2009 GigaVoxels work.

Out-of-core Raytracing was really hard to listen to. A big chunk of his presentation was a video of animated slides. Bad presenter, no biscuit! Even more frustrating is that the bulk of the presentation focused on the performance of his very limited raytracer. The raytracer can handle a lot of polygons, but it can't do shaders, textures, etc. When asked, he said, "Right now shading is not implemented in this project." Performance comparisons with other (more capable) raytracers are just not interesting.

What was interesting is the paging structure that he developed to make it work. The fundamental problem with GPU raytracing is that interesting scenes are multiple gigabytes, and GPUs can't handle that much data in a single batch. To over come this he developed a novel way of breaking the scene data into pages and determining which rays needed which pages. That was the interesting part, but he only spent a few scant minutes talking about it.

After that, I headed over to Stochastic Rendering & Visibility.

High-Quality Spatio-Temporal Rendering using Semi-Analytical Visibility is obviously not targeted at real-time rendering, but it seems like some aspects could be adapted. One of my favorite things about this presentation was the rich list of areas for future research.

Both Frequency Analysis and Sheared Filtering for Shadow Light Fields of Complex Occluders and Temporal Light Field Reconstruction for Rendering Distribution Effects assumed a lot of background knowledge that I don't have. In particular, all the bits about the Fourier analysis of occluder depths (which was a pretty key element!) was completely lost on me. I did find the medium frequency artifacts in the former paper interesting. Who gets medium frequency noise?!? It's always either high frequency or low frequency. I also appreciated the background material at the start of the later presentation. I'll probably harvest some of that for my VGP sequence.

The Area Perspective Transform: A Homogeneous Transform for Efficient In-Volume Queries was an interesting idea that seems to obvious to work. :) It takes a bunch of math operations to intersect a frustum with a bounding volume, but it's cheap to intersect an AABB with another bounding volume. Their solution? Project the points of the base (the big part) of the frustum to inifinit so that the lines become parallel (in the limit). I still need to read the paper to get all the details.

There were only a couple of papers that I paid attention to in By-Example Image Synthesis.

Techniques like Image-Guided Weathering: A New Approach Applied to Flow Phenomena are increasingly important in games. They take some examplar images to generate parameters for a particle simulation, then they use those parameters to generate new images. What's cool about this is that almost all of the computation is done to "reverse engineer" the original image. It ought to be possible to use that data (with some perturbation of the parameters) to generate new images each time the game is run. This is another one that goes on my "to implement" list.

I was really stoked about Discrete Element Textures. One of the authors, Li-Yi Wei, presented one of my favorite SIGGRAPH papers. This is "another" exemplar-based texture synthesis algorithm. Like some of the previous work they treat elements in the exemplar as units. This prevents the old artifacts of getting half a of a pebble, for example, in your gravel texture. I haven't read all of other previous work, but I thought it was pretty cool that their work also applies to 3D objects. The piles of carrots and plates of spaghetti were pretty cool.

Posted Wed Aug 10 08:07:17 2011 Tags:

I started the day with Advances in Real-Time Rendering in Games. I found it interesting that Halo Reach uses "state mirroring" to communicate updated world state from the simulation process to the rendering process. They're essentially treating the cores in the Xbox 360 as a distributed system rather than an SMP. This echos what Matteson et. al. found while developing the Single-Chip Cloud Computer.

The Cars 2 talk started the long train of "we do annoying stuff to work around the limitations of console GPUs." In particular, a large portion of their stereo rendering algorithm was implemented on the SPUs because the RSX cannot do scattered writes. It seems like at least the generation of the item buffer, which is the mapping of points the reference image to one of the stereo pairs, could be done on the RSX. Of course, OpenGL 4.x class GPUs can do this. It would be interesting to recast their algorithm using GL_ARB_shader_image_load_store (Note: DX people will know this as unordered access views / UAVs).

The Little Big Planet 2 talk was pretty cute. Their voxel algo rithm for doing AO in world space (instead of screen space) was clever and produces really good results. Generating one blob of expensive data and using it for a lot of things usually means you're winning. I'll also have to add mention to my VGP353 lectures that both LBP1 and LBP2 use variance shadow maps.

The CryEngine 3 talk moved really fast and spent a lot of time switching between speakers. It was hard to get much out of it. However, I did get a couple notes for things we need to make better in our driver.

They showed a bit of HLSL code:

float4 HPos = (vStoWBiasZ + (vStoWBiasX * Vpos.x) + (vStoWBiasY * Vpos.y)) *
    fSceneDepth;
HPos += vCamPos.xyzw;

The initial calculation of HPos should be implemented as a single DP2A instruction on architectures that can do that:

DP2A    HPos, Vpos, vStoWBias, vStoWBiasZ;

We can't do that with our current compiler, and it would take a lot of work to do so. One thing that we would have to do is notice that two independent float variables need to be packed into a single vec2. Doing that packing could have repercussions elsewhere in the code. This is where tree matching code generators that use dynamic programming and cost functions really shine.

They CryTek guys also complained about the high latency for frame read back (up to 4 frames) on desktop GPUs. They want to do CPU processing on the depth buffer from a previously rendered frame to do approximate visibility calculations for later frames. On consoles, there's only a single frame of lag.

They also mentioned that they have used cascaded shadow maps since Crysis 1.

I ducked out of Advances in Real-Time Rendering in Games part way through to go to Drawing, Painting, and Stylization. I caught just the tail end of A Programmable System for Artist Directed Volumetric Lighting, and I wish I had seen the rest of it.

I actually went to that session to see Coherent Noise for Non-Photorealistic Rendering, and I was quite pleased. There are two main problems with applying noise to NPR. Using 3D noise depends on depth, so the frequency of noise applied to object (e.g., to make lines wavy) depends on the how far away from the camera it is. Using 2D noise gives what the authors called the "shower door effect." That is also fail.

They really want three things:

  • cheap to compute
  • noise moves with objects
  • uniform spectrum in screen space

The algorithm that they developed accomplishes these goals. Throughout the presentation, they showed a bit of UP! with Russel hanging from the garden hoses below the house. With their NPR noise applied as the "texture," the still image just looked like white noise. As soon as the animation began, all of the structure of the image magically appeared. It was pretty cool. I'll need to implement it and make some piglit test cases.

The final session of the day was Tiles and Textures and Faces Oh My!

I enjoyed the Procedural Mosaic Arrangement In "Rio" talk. It felt like their system was perhaps a little over complicated, but that complexity may give it more flexibility for reuse. Not only was the technique used for the mosaic sidewalks in Copacabana but also for "plain" cobblestone streets. I would like to have heard more about that. This is something that everybody does in games, and everybody does it poorly.

There were some bits of Generating Displacement from Normal Maps for Use in 3D Games that I didn't fully grok. Their initial attempt was to reverse the derivative approximation used to generate the normals and propagate the new offsets around. It seems obvious that this wouldn't work. They moved on from that to a polar coordinate integration scheme that wasn't communicated very well by the presentation. I'll have to read the paper and think about it some more.

I first heard about Ptex at the "open-source in production" panel last year. At the time, I had thought that you could implement them in a fairly straightforward way on a GPU. Per-Face Texture Mapping for Real-Time Rendering follows pretty much what I was thinking, but there are some differences.

To maintain some batching, all surfaces with textures of the same aspect ratio are grouped. Textures of a particular aspect ratio are stored in a series of non-mipmapped texture arrays. Each texture array corresponds to a particular mipmap size for that aspect ratio. That is, all of the 512x512 mipmaps are in one texture array, all of the 256x256 mipmaps are in another, etc. When a texture is sampled, the texture is selected (texture array index), the LODs are calculated (the arrays), everything is sampled, and the intermipmap blending is done. This is all open-coded in the shader (via ptex function).

Having this goofy organization means that a texture with a 64x64 base level doesn't waste a bunch of space when it's batched with a texture with a 2048x2048 base level. It's the classing time vs. space trade-off.

Posted Tue Aug 9 14:45:01 2011 Tags:

After seeing her wonderful display of US history knowledge, I have decided to replace the words "retard" and "retarded" in my vocabulary with "Palin." Imagine:

  • "That is the most Palin thing I've ever heard anyone say."

  • "Did you see that guy's driving? Is he a Palin or something?"

  • "Hey you bully! Don't pick on those Palin kids!"

Well, maybe not the last one...

Posted Tue Jun 14 11:58:12 2011 Tags:

I finally started working on ud-chains for Mesa's GLSL compiler. I'm continually amazed that it works as well as it does without them. There are so many ugly hacks and so much ad hoc data-flow analysis that I just couldn't put it off any longer. If you want to cry, take a look at how reaching definitions are determined in loop_analysis.cpp. It's pretty horrible.

My primary goal is to get enough infrastructure in place to fix that code. In addition to the ugliness, it fails on cases like:

void main()
{
    int n = 3;
    int i = 0;

    if (b)
        some_function();

    while (i < n) {
        some_other_function();
    }
}

The loop analyzer will have no idea what the values of i or n are. There are similar issues with identifying values that are constant across all iterations of the loop. In this case, flow control inside the loop confuses the analyzer.

The first step to this project is to get real basic block tracking. We have a mechanism to iterate over instructions as basic blocks (see ir_basic_block.cpp, but nothing is stored about the blocks. This means we don't know which blocks can transfer control to which other blocks.

I now have some code that creates a the basic block structure of a function. The next step will be to process the instructions in the basic blocks to track the (value) definitions within the block.

In the mean time, I modified the stand-alone compiler to optionally dump a graph of the basic blocks in dot. The results, even for small shaders, are shockingly cool.

Here's the code for the piglit test glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test:

uniform float one;
uniform int count1, count2;
uniform vec4 red;

void main()
{
    vec4 array[11];
    array[0] = vec4(0.1, 0,   0,   0) * one;
    array[1] = vec4(0,   0.1, 0,   0) * one;
    array[2] = vec4(0,   0,   0.1, 0) * one;
    array[3] = vec4(0,   0.1, 0,   0) * one;
    array[4] = vec4(0,   0  , 0.1, 0) * one;
    array[5] = vec4(0,   0,   0.1, 0) * one;
    array[6] = vec4(0.1, 0,   0,   0) * one;
    array[7] = vec4(0,   0.1, 0,   0) * one;
    array[8] = vec4(0,   0.1, 0.1, 0) * one;
    array[9] = vec4(0.1, 0.1, 0,   0) * one;
    array[10] = vec4(0.1, 0,   0.1, 0) * one;
    for (int i = 0; i < count1; i++)
        for (int j = 0; j < count2; j++)
            array[i*3+j] += red*float(j+1);
    gl_FragColor = array[0] + array[1] + array[2] + array[3] + array[4] +
        array[5] + array[6] + array[7] + array[8] + array[9] + array[10];
}

And here is the resulting basic block graph:

Posted Tue May 3 16:36:04 2011 Tags:

This wiki is powered by ikiwiki.