2018-09-21

Octrees: Motivation and Overview

Every 3D engine for games has to solve a few spatial queries. Specifically:

  1. Visibility Queries. Given a camera position and orientation, which scene geometry intersects with the camera frustum and should thus be rendered?
  2. Collision Queries. Prevent the camera from moving into scene geometry. Have simulated gravity pull entities to the ground.
  3. Range Queries. Return all light sources within a given radius.
  4. Ray Casting. What did the player just click on? Cast a ray from the camera position through the mouse cursor in the view plane into the scene geometry.
  5. Ray Tracing. Is this part of the scene geometry visible from this light? Does the enemy character have a clear line of sight to the player?
  6. Nearest Neighbor Queries. What is the closest enemy to the players' position?

What all of these have in common is that they quickly become inefficient if implemented naively directly on the scene geometry. Modern game environments are typically comprised of millions of individual triangles, so if every collision test would have to check every triangle of the player character against every triangle of the environment it would quickly become infeasible. Thus we need a smarter way of implementing these tests.

There are a few approaches for tackling this. One could sort objects spatially (for example along the coordinate axis or along a space filling curve) or one could bucket them into grids. The idea would be to stop looking for further collisions once you are too far away from the query object according to the sort order or grid cells. There are a few tricky edge cases because mapping three dimensional space to a one dimensional curve will not preserve distances perfectly. In the grid case one has to choose the cell size in relation to maximum and minimum object sizes very carefully. Despite these issues, one can make it work.

More commonly one chooses a different approach. The first idea is to coarsely approximate input geometry with simpler shapes that are easier to do calculations with. Commonly these will be spheres, oriented bounding boxes (OOB) or axis aligned bounding boxes (AABBs). These approximations must be conservative, that is, they must fully enclose the original geometry. This guarantees that if the proxy geometry isn't colliding, neither can the original geometry be. Note that the inverse is obviously not true: just because the proxy geometries intersect does not mean the original geometries are. So the bounding volumes should be conservative, yet at the same time they ought to be minimal. Otherwise the fit is imprecise and a lot of the efficiency gains are wasted.

The second idea is to organize these proxy geometries in some sort of hierarchy. Each successively higher level of the hierarchy fully contains all its lower levels. This allows a divide and conquer strategy, bisecting the problem and quickly discarding large chunks of it. If I know you are currently in the Southern hemisphere of the globe, I don't have to go looking for you in London. Bounding volume hierarchies (BVHs) follow the same idea, turning a linear search over all objects into a logarithmic tree search.

The most common strategies can roughly be summarized by whether they partition the input space statically (and thus the maximum extent of the space must be fixed and known in advance) or dynamically based on the exact input geometries (and thus it's hard to mutate or animate the input geometries). Incidentally this is also the order of increasing complexity of the data structures:

  • Grid: The space is tessellated into equally sized hyperrectangles (usually cubes).
  • Octree: The space is recursively subdivided into octants. Usually with axis aligned split planes through the center of the current cube, so that you end up with eight equally sized child cubes.
  • KD-Tree: The space is recursively subdivided with a hyperplane perpendicular to one of the coordinate axis. Which axis is chosen dynamically for every level of the tree based on the input data.
  • R-Tree: This is a BVH that encloses the input data in nested hyperrectangles like a very boxy Russian doll.
  • BSP-Tree: Generalized KD-Tree. A BSP tree allows arbitrarily oriented split planes.

I'm not 100% sure how the R-Tree compares, but the others are listed in order of increasing expressiveness. That is, a BSP-Tree can model everything a KD-Tree can, and that in turn can model everything an Octree can. Thus data structures lower in the list are strictly more powerful than those higher up.

There are a lot of trade-offs involved when choosing any space partitioning strategy. There's no single best solution fitting all requirements. Next post I'll list mine and the resulting choice (the title of this post might have given it away already;-)). The idea is to explore this data structure in a series of posts and see how far I can take it.