As part of adding streaming 3D buildings to Cesium, we implemented some interesting view frustum culling optimizations for bounding volume hierarchies (BVHs).
In particular, we implemented plane masking as described by Sýkora & Jelínek in Efficient View Frustum Culling (Section 2.5). In this article, we explain the algorithm, including pseudocode, and share performance results.
Generally, when frustum culling nodes in a BVH, each node is checked against the six clipping planes of the frustum to determine whether it is potentially visible, i.e., inside the frustum. For each plane (which divides 3D space in half), the node can be either completely inside (on the camera-visible side), completely outside (invisible to the camera), or intersecting (crossing the plane).
BVHs, and other spatial data structures such as octrees and k-d trees, are spatially coherent, meaning child nodes are fully contained by their parents. A common optimization for hierarchical view frustum culling with a spatial data structure is to exploit spatial coherence.
If a node is completely outside of the frustum, so are its children, so traversal can stop. If a node is completely inside of the frustum, so are all of its children, so they can be traversed without frustum culling checks. Frustum checks for child nodes are only needed when the parent is intersecting the frustum.
We can improve performance even further by considering individual frustum planes instead of the entire frustum. Plane masking exploits spatial coherence for each culling plane that defines the frustum. For each node which has not been eliminated completely (i.e., the node is not outside), it must either be inside or intersecting each of the six culling planes. When checking frustum intersection of a node, the inside/intersecting state for each plane is returned to be passed down to the child nodes. The child nodes use this to skip checks against any planes that the parent node has already been determined to be inside. The elimination of these additional plane intersection tests provides the speedup.
The benefit of avoiding plane intersection checks is just a few operations, but they are repeated many times (6 bounding planes times n bounding boxes) and have a significant performance impact (see the Results section below). When testing frustum intersection with oriented bounding boxes, one plane-box intersection test requires four dot products and several comparisons.
These examples show the culling process for different camera locations in a simple scene with the following hierarchy:
In the following examples, all plane intersection checks in italics are those
which can be avoided because the result is predetermined. Those which are
struck through are irrelevant because the node is already known to be
outside. Intersections are evaluated in reading order in the table.
Parent node inside:
In this example, the parent node is completely inside the frustum. During hierarchical view frustum culling, the frustum planes are checked only against the parent node.
Parent node intersecting:
In this example, the parent node is intersecting the frustum at Plane 2. During hierarchical view frustum culling, the children check only this plane.
Parent node outside:
In this example, the parent node is completely outside the frustum so its children are not checked.
The following code examples show the difference between using spatial coherence for the entire frustum compared to each frustum plane using plane masking.
Basic Hierarchical Culling
Hierarchical Culling with Plane Masking
To see the performance gains of plane masking, we profiled our
selectTiles sees speedups of around 15% in Chrome and 20-34% in Firefox Nightly.
These results were captured at steady state in a 1920x1080 viewport, with the globe turned off to reduce profiling noise.
|Percentage of Total CPU Time within
||Chrome 43.0||Firefox Nightly 42.0a1|
|Without plane masking||8.70%||0.87%|
|With plane masking||7.68%||0.65%|
|Percentage of Total CPU Time within
||Chrome 43||Firefox Nightly 42.0a1|
|Without plane masking||6.55%||0.60%|
|With plane masking||5.83%||0.50%|
- Speedup is calculated in a way that eliminates factors outside of
selectTiles, such as the removal of the globe rendering: for two profile percentages, the speedup is
(1/x2 - 1) / (1/x1 - 1) - 1
- The difference in magnitude between Chrome and Firefox is because Firefox includes a "Graphics" component in its profiling results (~85%), while Chrome does not.
- Firefox Nightly was used for its newly improved profiling tools.