I started the day with Advances in Real-Time Rendering in Games. I found it interesting that Halo Reach uses "state mirroring" to communicate updated world state from the simulation process to the rendering process. They're essentially treating the cores in the Xbox 360 as a distributed system rather than an SMP. This echos what Matteson et. al. found while developing the Single-Chip Cloud Computer.

The Cars 2 talk started the long train of "we do annoying stuff to work around the limitations of console GPUs." In particular, a large portion of their stereo rendering algorithm was implemented on the SPUs because the RSX cannot do scattered writes. It seems like at least the generation of the item buffer, which is the mapping of points the reference image to one of the stereo pairs, could be done on the RSX. Of course, OpenGL 4.x class GPUs can do this. It would be interesting to recast their algorithm using GL_ARB_shader_image_load_store (Note: DX people will know this as unordered access views / UAVs).

The Little Big Planet 2 talk was pretty cute. Their voxel algo rithm for doing AO in world space (instead of screen space) was clever and produces really good results. Generating one blob of expensive data and using it for a lot of things usually means you're winning. I'll also have to add mention to my VGP353 lectures that both LBP1 and LBP2 use variance shadow maps.

The CryEngine 3 talk moved really fast and spent a lot of time switching between speakers. It was hard to get much out of it. However, I did get a couple notes for things we need to make better in our driver.

They showed a bit of HLSL code:

float4 HPos = (vStoWBiasZ + (vStoWBiasX * Vpos.x) + (vStoWBiasY * Vpos.y)) *
    fSceneDepth;
HPos += vCamPos.xyzw;

The initial calculation of HPos should be implemented as a single DP2A instruction on architectures that can do that:

DP2A    HPos, Vpos, vStoWBias, vStoWBiasZ;

We can't do that with our current compiler, and it would take a lot of work to do so. One thing that we would have to do is notice that two independent float variables need to be packed into a single vec2. Doing that packing could have repercussions elsewhere in the code. This is where tree matching code generators that use dynamic programming and cost functions really shine.

They CryTek guys also complained about the high latency for frame read back (up to 4 frames) on desktop GPUs. They want to do CPU processing on the depth buffer from a previously rendered frame to do approximate visibility calculations for later frames. On consoles, there's only a single frame of lag.

They also mentioned that they have used cascaded shadow maps since Crysis 1.

I ducked out of Advances in Real-Time Rendering in Games part way through to go to Drawing, Painting, and Stylization. I caught just the tail end of A Programmable System for Artist Directed Volumetric Lighting, and I wish I had seen the rest of it.

I actually went to that session to see Coherent Noise for Non-Photorealistic Rendering, and I was quite pleased. There are two main problems with applying noise to NPR. Using 3D noise depends on depth, so the frequency of noise applied to object (e.g., to make lines wavy) depends on the how far away from the camera it is. Using 2D noise gives what the authors called the "shower door effect." That is also fail.

They really want three things:

  • cheap to compute
  • noise moves with objects
  • uniform spectrum in screen space

The algorithm that they developed accomplishes these goals. Throughout the presentation, they showed a bit of UP! with Russel hanging from the garden hoses below the house. With their NPR noise applied as the "texture," the still image just looked like white noise. As soon as the animation began, all of the structure of the image magically appeared. It was pretty cool. I'll need to implement it and make some piglit test cases.

The final session of the day was Tiles and Textures and Faces Oh My!

I enjoyed the Procedural Mosaic Arrangement In "Rio" talk. It felt like their system was perhaps a little over complicated, but that complexity may give it more flexibility for reuse. Not only was the technique used for the mosaic sidewalks in Copacabana but also for "plain" cobblestone streets. I would like to have heard more about that. This is something that everybody does in games, and everybody does it poorly.

There were some bits of Generating Displacement from Normal Maps for Use in 3D Games that I didn't fully grok. Their initial attempt was to reverse the derivative approximation used to generate the normals and propagate the new offsets around. It seems obvious that this wouldn't work. They moved on from that to a polar coordinate integration scheme that wasn't communicated very well by the presentation. I'll have to read the paper and think about it some more.

I first heard about Ptex at the "open-source in production" panel last year. At the time, I had thought that you could implement them in a fairly straightforward way on a GPU. Per-Face Texture Mapping for Real-Time Rendering follows pretty much what I was thinking, but there are some differences.

To maintain some batching, all surfaces with textures of the same aspect ratio are grouped. Textures of a particular aspect ratio are stored in a series of non-mipmapped texture arrays. Each texture array corresponds to a particular mipmap size for that aspect ratio. That is, all of the 512x512 mipmaps are in one texture array, all of the 256x256 mipmaps are in another, etc. When a texture is sampled, the texture is selected (texture array index), the LODs are calculated (the arrays), everything is sampled, and the intermipmap blending is done. This is all open-coded in the shader (via ptex function).

Having this goofy organization means that a texture with a 64x64 base level doesn't waste a bunch of space when it's batched with a texture with a 2048x2048 base level. It's the classing time vs. space trade-off.