We recently added WebGL instancing to Cesium. Instancing is an optimization technique that enables you to render a model multiple times with a single draw call. The definitive use case for Cesium is rendering repeating elements such as trees, vehicles, and satellites in an efficient manner. Since many others have already covered this topic (TojiCode), I’m going to focus on how we use instancing for billboards.
A billboard is a viewer-facing textured quad. Every icon on the map or letter in a label is a billboard. Often we are rendering thousands of these at a time, such as in the KML demo.
In order to reduce the number of draw calls, Cesium batches billboards together. Instead of rendering each billboard individually, we place them all in a shared vertex buffer. A billboard consists of four vertices, each of which encodes the billboard’s position and visual properties as vertex attributes. For more info about how we compress vertex attributes, check out this blog post.
We then create a large index buffer, where every six indices defines each successive billboard to draw. The index buffer is repetitive and looks like this:
[0 1 2 0 2 3 | 4 5 6 4 6 7 | 8 9 10 8 10 11 | ... ]
Once the buffers are prepared, we can render the entire billboard collection with a single drawElements call. The vertex shader will correctly position the billboard, and the fragment shader will sample the designated region of the texture atlas. The system is designed so all the per-billboard information is encoded in vertex attributes rather than uniforms.
Instancing follows a similar philosophy to batching. The goal is to place all per-instance information inside vertex attributes rather than uniforms. However, the key difference is you can specify a vertex attribute to be instanced, meaning it advances once per instance instead of the usual once per vertex.
The main benefit here is memory savings. You can separate your instanced attributes from your per-vertex attributes. For example, if you are rendering 1000 trees you might have one vertex buffer that stores the usual position, normal, and texture coordinates of the model, and another vertex buffer that stores the translations for each instance. Then you would set the divisor of the translation attribute to 1, which specifies that the attribute advances once per instance.
This applies well to Cesium’s billboards, where most of the vertex attributes are identical among each of the four vertices. The only vertex attribute that cannot be instanced is the ‘direction’, which specifies if the vertex is located at (0, 0) (1, 0) (1, 1) or (0, 1). The direction is used to compute the vertex’s position and texture coordinate.
So now we have two vertex buffers: one buffer stores the directions and another buffer stores the instanced attributes (everything else). The first buffer is very small and can be shared among all the billboards:
[0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0]
The instanced buffer is larger, but there is no repeated data as in the batching approach. Ultimately we see roughly a 75% decrease in vertex buffer size because we have gone from 4 vertices per billboard to 1 instanced vertex per billboard.
We also see a large memory reduction in our index buffer. Before we had to repeat a lot of indices in order to draw each billboard in the batch. Now the index buffer is simply
[0, 1, 2, 0, 2, 3]
The final step is calling drawElementsInstanced with the number of billboards as the primcount.
In terms of performance, we see relatively equal results for instancing vs. batched. However we do see a slight improvement when animating the billboards. This is because we need to update the vertex buffer with new positions every frame, and the instanced buffer is 1/4th the size. There is less CPU overheard in Cesium and the WebGL implementation (browser/driver) to fill the arrays and less memory transfer to the GPU.
Testing with 10,000 billboards:
Instancing: Static: 266 fps Animated: 138 fps Batched: Static: 262 fps Animated: 120 fps
Check out the code in BillboardCollection.js.