This is my attempt to combine Real-Time Reprojection Cache and Screen Space Ambient Occlusion. Using such caching scheme, the spatio-temporal coherence nature of the SSAO algorithm can be exploited. You can download the demo with shader source here.
The name "Accumulative SSAO" comes from the fact that the occlusion value is accumulated and averaged over a number of frames. The algorithm itself is quite independent of how the occlusion is calculated and here I will assume the reader is familiar with SSAO implementation such as those from Crysis and Startcraft II.
The pipeline
For every frame,
The scene was rendered using deferred shading technique, producing the color, normal and depth buffers.
A number of random vectors were generated in CPU (where in usual SSAO these vectors only generated once in the program).
The normal and depth buffer are then utilized to calculate the occlusion value in the SSAO pass.
Instead of writing the occlusion value to the final output, it would combine with the previous frame's accumulated occlusion value and then written to a second accumulation buffer.
A blur pass can optionally apply to the most updated accumulated occlusion buffer.
The color buffer was then combined with the occlusion value to product the final result, also the two accumulation buffers were switched with each other.
Re-projection
The re-projection happens in the SSAO pass when it tries to access the previous frame's occlusion value. Having the eye-space 3d position for each pixel, we can transform that into a texture coordinate by using a matrix (and a perspective division afterward), lets call it the delta matrix. This matrix is calculated on CPU as:
In simple words, for each current frame's pixel, we are trying to locate their corresponding pixel coordinate on the last frame. If there is no camera movements, the two coordinates should be the same.
Accumulative AO
With the re-projection working, the current frame's occlusion value can be combined with the previous one with the following accumulation formula:
In order to make something interesting for the above equation, the current occlusion value should not be the same as the previous one. Therefore, a new set of sampling position should be generated for each frame, this can be done by re-generating the random unit sphere samples or the dithering texture every frame. In a loosely sense, it is doing a Monte Carlo Integration over the time domain. To achieve better visual quality, more frames should be taken over the time.
As each frame's AO value will also depends on the last few frames, there will be some time delay for the AO to become up-to-date in a dynamic scene. However, by changing the numerator and denominator in the equation, the trade-off between quality and responsiveness can be adjusted.
Cache-miss consideration
Up to now the cache miss problem of the re-projection is not yet addressed. A cache miss will happen if somewhere in the scene that cannot be seen before becoming visible now, due to camera or object movement. Such a cache miss can be detected by comparing the current pixel's depth value with it's re-projected counterpart. If the two values differed by a certain threshold, a cache miss is detected. And to do this, the last frame's depth value is needed. Instead of using a separated texture to store the last frame's depth value, the depth can be encoded and stored together with the accumulative AO value into a 32-bit texture.
// Encode a float value into 3 bytes
// The input value should be in the range of [0, 1
// Reference: http://www.ozone3d.net/blogs/lab/?p=113
vec3 packFloatToVec3i(const float value)
{
const vec3 bitSh = vec3(256.0 * 256.0, 256.0, 1.0);
const vec3 bitMsk = vec3(0.0, 1.0/256.0, 1.0/256.0);
vec3 res = fract(value * bitSh);
res -= res.xxy * bitMsk;
return res;
}
float unpackFloatFromVec3i(const vec3 value)
{
const vec3 bitSh = vec3(1.0/(256.0*256.0), 1.0/256.0, 1.0);
return dot(value, bitSh);
}
If there was a cached miss, the accumulative AO will be discarded and the instance AO value is used instead. Of course more samples can be taken in this frame to reduce the visual impact of the cache miss.
Discussion/improvements
Currently a new independent set of random samples were generated for the above video demo. Other random sample over time generation method may reduce the noise.
As some of the re-projection cache scheme suggested, a cache value should be cleared after a certain period of time to avoid in-stability and provide a better response to dynamic environment, and this is done here by the accumulation formula.
To reduce cache miss due to object movement, each object's last transformation matrix can also be incorporated into the algorithm.
The depth encoding scheme also make the blur pass much more efficient.
Conclusion
The explained algorithm provides a new way to improve the quality and efficiency of traditional SSAO by using the result from a number of frames instead of one. It also opens up more parameters and sampling patterns to explore with.