Tuesday, November 15, 2016

Tiled Terrain Update and Screenshots

The past few weeks I haven't been working on 3DWorld as much as usual, partly because I'm preparing to move to a new house and have been distracted with other things. The work I've been doing with 3DWorld is mostly related to minor improvements to visuals and optimizations for generating and rendering large outdoor scenes. I've made many small improvements to tiled terrain mode. Each individual change has only a minor impact on the scene, but together they do improve visual quality considerably. I'll discuss some of these changes and show new screenshots in this blog post.

Terrain Normal Maps

I've modified the terrain texturing system in 3DWorld to support separate detail normal maps for each terrain texture layer (sand, dirt, grass, rock, and snow). I found suitable normal map textures online for all layers except for grass, for which I couldn't find a texture that I liked. The grass texture layer is drawn with a fine field of triangle grass blades, which cover up the mesh so that the normal map isn't visible anyway. This adds many more texture lookups inside the terrain fragment shader, but my GeForce GTX 1070 shows almost no change in frame rate. I'm sure this feature hurts frame rates on older/slower/cheaper cards though.

Here are some screenshots of steep normal mapped hills showing all of the texture layers. I have disabled trees and water so that the terrain is fully visible. Grass and plants are left enabled.

A different normal map is assigned to each of the five ground texture layers that make up this steep hillside.

Terrain normal maps viewed from a different angle.

Normal maps applied to rocky mountains, viewed from above from a distance. The terrain appears to have very high detail.

These new normal maps add a good deal of complexity and realism to the scene with little cost. They make the mesh appear to be highly detailed, almost for free.

Leafy Plants

After reviewing several other terrain generation and rendering systems written by other people, I decided that 3DWorld's terrain needs more detailed ground cover. The area on the ground under the trees was too bare, even with all of the grass. I had this as a todo list item for maybe a year before I finally got around to adding another plant type. Here they are. I call them "leafy plants".

View of a grassy beach with some new procedurally generated leafy plants.

There are four different types of leafy plants that appear at different elevations, including one type that is underwater. They're created by applying tree leaf textures to spherical sections that are scaled to an ellipsoid shape. This gives the leaves a smooth curve, rather than having them be boring flat polygons. However, they are also more expensive to create and draw than other types of plants that use single polygons (quads) for leaves. I had to use a sparse distribution of leafy plants to keep performance under control. These new plants seem to blend in well with the other plants, grass, and flowers. They're just one more component to the 3DWorld ground cover, meant to fill the gaps between the trees.

Improved Pine Tree Branch Randomization

I made another pass at procedural generation of pine tree branch sizes, positions, and orientations. This new approach uses more random numbers to produce a less symmetric branch coverage, which removes some of the repeating patterns found in my previous pine tree forests. Each branch is rotated a small random amount around the vertical (z) axis of the tree, and shifted a random distance up or down in z as well. This makes the trees look more natural and realistic, especially when shadows are enabled. Keep in mind that every tree in this scene is unique. Instancing is only used to save memory when there are more than ~100K trees.

Pine tree forest on the mountain, with high detail shadows and more randomized tree branch placement.

Note that in this screenshot, and the images from the following sections, I've increased the high resolution shadow map distance. This improves shadow map quality, at the expense of increased GPU memory usage and slightly longer tile generation time. My current GPU has 8GB of memory, so spending a few hundred MB on improved shadows seems like a good decision. The shadow map smoothly transitions to a lower resolution baked mesh shadow at about half the max view distance. It's possible to push the shadow maps out to the full view distance, at the loss of a smooth frame rate.

Large Numbers of Trees

I decided to test the limits of how many trees I could get onscreen at once. I optimized some of the tree placement and pine tree leaf/branch generation code so that it scaled better to many trees. At this point the number of trees is probably limited by the time taken to send the vertex data from the CPU to the GPU. I'm not sure how to profile this, or optimize it further.

Here is a scene showing pine trees on a terrain with tall mountains that extends into the distance.

Tall mountains covered with pine trees. The terrain height has been scaled and zoomed out to produce sharp peaks.

This is a scene showing tens of thousands of deciduous trees drawn using 50 unique generated tree models. 3DWorld can create hundreds of unique tree models with an 8GB graphics card. However, this increases initial scene load time, and I prefer to keep loading under a few seconds. [Update: I can see duplicate trees in this screenshot, so I increased the number of unique trees to 100 and tweaked the type randomization code to remove them.]

Distant, foggy mountains with trees.

Here I have decreased tree size and increased tree density in the config text file. The active area of the scene contains 4-5M pine trees, and around 1M trees are visible in these next two screenshots. This scene renders at a consistent 78 FPS, mostly independent of which way the camera is facing.

Maximum tree density: Pine trees cover every dry patch of land. This scene contains at least 1M trees and renders at 78 FPS. The white spot on the distance is a patch of bare snow that is too steep for trees.

A million pine trees near sunset. Note the complex cloud lighting and the dark tree shadows.

Fall Leaves

Yes, it's that time of year again. The trees outside my house are turning bright colors (mostly red) and covering the sidewalk with leaves. Once again I'm showing off screenshots of colorful fall trees in 3DWorld. I have tree leaf color parameters tied into the editing UI this time, which allows me to change colors interactively using the arrow keys on the keyboard. Here are two screenshots showing fall deciduous trees in a variety of colors.

Trees with colorful fall leaves spread across the rocky ground.

More fall leaves with grass, flowers, and plants.

I'll continue to work on trees in tiled terrain mode. Maybe I can add winter trees that have no leaves and are covered with snow in time for Christmas. Of course, we don't get snow here in San Jose, CA.

Thursday, October 13, 2016

Fun with Stacking Spheres and Cubes

This is more of a fun post compared to some of my previous technical posts. I'll be sure to add a lot of YouTube videos below and a few static images at the end. This is my first interactive scene editor feature, but is more for fun and amusement rather than real level editing. I don't imagine I could create any useful level by placing various one meter cubes and spheres one at a time in front of the player's camera.

The following videos show my progress on dynamic object placement. I've made huge improvements in my workflow and "artistic quality" over the past week. There are many additional improvements to be done and features to add. I believe that eventually this system can be made into a full level editor, but it has a long way to go before it's useful for this purpose.

None of these videos were recorded with sound. I still only have two options for video recording: Fraps with sound, limited to 30 seconds; and ffmpeg with unlimited length but no sound. I'm sure I'll figure it out eventually.

The first video shows a stack of 1 meter cubes of a variety of different materials that I slowly placed one on top of each other. It was more fun to push the stack down than it was to build it. Note that I didn't have dynamic shadows working for the cubes at this point. I later added the ability to destroy cubes with weapons instead of only being able to push them around and knock them down.


After a while I grew tired of placing blocks one-by-one, so I made the placement delay another in-game user editable parameter. I found it was much easier to create large piles of cubes and spheres with the delay set to 0. In reality it's one per frame, or 16ms delay for a framerate of 60 FPS.

This next videos shows how I can create large stacks of cubes. I still hadn't added shadows. The cubes are created so that they stack but don't intersect each other. If I have collision detection turned on, I'll quickly wall myself into a corner. But if I disable collision detection I can just insert the cubes into one place and they'll form a stack as the bottom cube is inserted at the bottom and will push all the other cubes up... or maybe the new cube is inserted and pops up to the top of the stack. I'm really not sure - the code is pretty complex, and it all happens within a single frame. At least the cubes don't intersect with anything, which is the most important property. Oh, and if I made the cubes small enough, I could probably walk up them like stairs. I really should try that and record a video if it works.


I had the material editing menu up for much of the video so that I could easily change the materials. It would be nice to hide the menu somehow in the future while still having a way to change the textures. The material itself can be selected with a hotkey from a user-created list of predefined materials that is read with the scene data, but it doesn't have the variety of textures I wanted to use for this video.

There is a small yellow number printed near the center of the screen that counts the number of objects placed. Here I've placed 1533 cubes and spheres. This was added to track how many objects can be created within reasonable performance constraints.

I finally enabled shadows for the cubes and spheres. I spent some time creating high stacks and then knocked them down in this next video. It was pretty fun! All of the weapons push the cubes around, but the rocket launcher has the most force per unit time. The seek-and-destroy has about twice the force, but fires too slowly. If I make the cubes small enough, around 0.4m on a side, I can push them out of the stack with a rocket or two. All the cubes above the one that was pushed out will fall in quick succession. The system is very stable, but this many cubes hurts the frame rate at this stage of development.


I tried to stack spheres next. At first it didn't work because the collision detection for this case wasn't fully implemented, and the spheres just sat there overlapping each other, forming a giant blob of spheres. My attempted fix had a surprising an unexpected effect, shown in the video below. Groups of spheres stuck together in a quivering unstable ball, then floated off on a random path toward the sky. Sometimes the sphere clusters got stuck on static level geometry and pulsated there. What's going on here? I recorded the video, but I hadn't enabled a lower screen resolution and the video compression couldn't keep up well. The frame rate dropped, leading to a laggy recording where some parts ran at up to 2x realtime. Sorry. I've reduced the sphere drawing time by 5-10x since recording this so it shouldn't be as much of a problem in the future.


What was causing this bug? I had the collision response vector sign backwards, and colliding spheres were being pulled together like magnets rather than pushed apart. They would overshoot and separate slightly, only to be pulled back together in the other direction. Some instability (floating-point error?) caused the clusters of attracted spheres to drift off in random directions with a random walk. Some sank into the floor/ground, some floated off into the sky, and some got stuck in the static level geometry such as the lamp posts and building walls. If I had come across this bug without just having rewritten the sphere intersection code, I would have never figured it out. The effect was pretty funny though. I might even add an option to enable it from within the material editor. Magnetic materials? Negative gravity materials? I'll have a hard time justifying this as anything resembling real physics!

I later got spheres working without much trouble. They stick to each other like masses of fish eggs. This actually reminds me of the little colored sticky foam balls that my daughter used for building sculptures and got spread all over the house. The user can disable player collision detection and float around creating sculptures of 1 meter spheres that fill the level. I'm not sure what use this is in real gameplay, but you can get a few thousand spheres scattered about before the framerate starts to drop.


Maybe it's unrealistic to stick brick spheres together like this. What should really happen when placing spheres this way in a sane world? I guess they would all fall down and roll around until they covered the ground - but that's no fun! Maybe I can make another option to control this later.

The final video of this post shows the exploding cube effect. I can mark objects as exploding in the material editor, and then fill the scene with them. Any hit from a weapon will detonate the object and also destroy the surrounding objects within the blast radius. One explosion takes out several cubes at a time, which allows me to destroy them all much more quickly. This clears space for placing even more stacks. I call it the "undo" feature. If you need more precision, there's also a "shatterable" material mode that will only destroy the object that was directly hit with weapon fire.


I spent some time stacking cubes and pushing them down, but it's not too much different from what I've shown in the previous videos. Here are two images of my "artwork". The first image shows a few thousand stacked cubes of various sizes and materials. This was earlier in the development when the framerate was too low to add more, and shadows still weren't enabled.

Thousands of stacked cubes of various materials and sizes litter the courtyard. The office workers are hiding in fear.

Imagine having this stack of cubes outside a real office building! Would anyone want to walk anywhere near it? In a real environment, the wind would probably push these stacks over, and one falling stack would bring all of the others down like dominoes. This stuff is fun to build due to the pure absurdity of the whole system. But, keep in mind, this is just a prototype of the level editor that in the future will be used to construct structures such as the office building itself. It's in no way meant to represent physically correct physics or real gameplay. I'm not creating another Minecraft clone here.

This next image shows a scene that took quite a while to create, even with one cube placed per frame. There are over 10,000 cubes here, in stacks that reach hundreds of feet into the sky. Some of them go above the cloud layer and beyond the far clipping plane so that the boxes at the top aren't even visible to the player. The stacks take up to 10s to fall if you remove a block near the bottom and watch the rest of them drop one by one. However, they do still completely fall. Shadows and indirect lighting from the lamps in the courtyard are enabled.

More than 10,000 cubes stack to the sky. This artwork took me a while to create and was a shame to lose when I realized I hadn't completed the "Save Level" feature yet.

It was a shame to create this wonderful bit of cube insanity and then throw it all away when I quit 3DWorld. See, I didn't implement a "Save Level" feature until after creating this scene. On the plus side, this problem encouraged me to finish the save feature in a hurry. Now the save system is complete, and I can save, load, and modify all of my cube and sphere creations as much as I want.

This post is short on technical details, so I should probably talk about why the performance was poor in the beginning, how I improved it, and how the physics works.

There are two types of user placeable objects in 3DWorld: dynamic objects and movable static objects. There are various other object types, but they can't be created by the player. The previous post on sphere and cube materials was all about dynamic objects. These include weapons, ammo, projectile effects, smiley body parts, etc. Anything that has free movement physics is a dynamic object. These objects support Newtonian physics with gravity, elastic and inelastic collisions, momentum, friction, air resistance, buoyancy, etc. Basically, all of the physical parameters that are normally modeled in games and physics engines, plus a few obscure ones that have little impact on the physics but were fun or challenging (= fun) to add.

There are three problems associated with building using dynamic objects:
  1. The physics really only works correctly for spheres (as implemented in 3DWorld).
  2. This system is too expensive for simulating thousands of interacting objects.
  3. There are problems with stability when creating stacks of objects.
The solution to these problems is to use a different physics system that solves for static constraints rather than realtime dynamics. I've extended my framework for movable (player pushable) static objects to work with user-placed material cubes and spheres. It works better with cubes since they have flat surfaces and the constraints are simpler, but it's acceptable for spheres. I could in theory add support for other 3DWorld shapes: cylinder, cone, capsule, torus, polygon, extruded polygon. However, that's a lot more work, and these other more complex shapes have more parameters that need to be set by the user. Cubes and spheres only have one parameter: size/radius.

When I say static objects, I really mean persistent objects that are created once and last forever, and will remain in place if no dynamic forces act on them. Static movable objects have a more limited set of physical parameters and a simpler physics model. There is no momentum, torque, friction, or elasticity. All collision responses other than gravity are resolved statically within a single frame. They can be pushed around by the player, stacked, and dropped - but that's about it. As I've shown in previous posts, buoyancy in water works, as do some other minor effects. Since there is no torque or friction, and materials are infinitely hard/rigid, they can be placed in stable stacks of unlimited height. As long as there is an object below supporting the objects above, everything stays in place. Remove one object (cube) from the bottom, and the cubes above will fall one-by-one until the entire stack has fallen down to the next supporting block (or the ground). Since static cubes don't rotate when stacked, any contact point on the bottom of the cube will support it, even if it's just a tiny corner of another cube. At least this simplifies the math.

Static objects are faster to simulate than dynamic objects, but they're still not free. My 10K blocks scene was dropping the frame rate all the way down to 40 FPS. Almost all the time (maybe 80%) was in querying the scene bounding volume hierarchy (BVH) for adjacent objects that were either supporting or resting on the current query object. The BVH was built incrementally while the objects were added, so it may not be optimal, in particular for these dense stacks.

The important observation here is that most of the time, an object is not moving at all. In fact, the majority of the game frames observe no block motion. The blocks only move when dropped, pushed, or shot at by the player, and there's a limit to how many blocks a player can get moving at the same time. The fix is to track which objects are moving vs. "sleeping" and not simulate the sleeping objects every frame. I used an active counter to track how many frames it's been since each object was last updated/moved. If an object goes for 8 consecutive frames without moving, it's marked as sleeping and only checked for collision once every 16 frames. This cuts the simulation time down by a factor of 16 in mostly static scenes. To avoid large delays with falling objects, every active object wakes up all other objects within a certain distance of it when it moves. If the player fires a rocket at a cube, the collision/explosion will wake up the cube, which will wake up the cubes above and below it as it falls. The chain effect will incrementally wake up the entire stack (but just that one stack), and make all of the blocks fall in quick succession.

This change improved the framerate from 40FPS to 90FPS. Good enough for now. I think the framerate is currently limited by actually drawing all of the cubes, or maybe the occlusion culling. I should be able to create 15K+ cubes while still hitting a solid 60 FPS. Spheres are more expensive to draw, so I can only have about 2000 of them in the scene.

Monday, September 26, 2016

Physically Based Materials

I'm continuing to work on improving object reflections in 3DWorld. The past few weeks I've been trying to integrate reflective objects into the engine as physically-based materials. As a first step, I provided a way to create and throw spheres and cubes with user-defined material properties as dynamic objects in the scene. The properties are specified in a config text file in keyword/value format. There is also a simple UI for realtime editing of material parameters and creating new materials. The UI is an in-game text overlay with arrow key input similar to the onscreen display you would find in a computer monitor. It's very simple but usable. I would like to use the mouse to select menu items, but I think it would interfere with the user's ability to play the game and interact with the world while the menu system was active.

The material parameters supported are:
  • Material Name - User-defined text string identifier
  • Texture - Name/filename of texture to use; "none" to disable texturing
  • Normal Map - Name/filename of normal map texture to use; "none" to disable normal mapping
  • Shadows - Flag to enable cube map shadows for point light spheres
  • Emissive - Flag to mark as having an emissive color (no lighting)
  • Reflective - Flag to mark surface as reflective (using an environment cube map)
  • Destroyability - Tag to mark as destroyable, shatterable, exploding, static, etc.
  • Metalness - Value in [0,1] to represent dielectric vs. metal
  • Hardness - Value in [0,1] to set hardness for elastic collision physics
  • Density - Value of material density, used to compute mass and buoyancy in water
  • Specular Magnitude - Magnitude of specular light reflection in [0,1]
  • Specular Shininess - Shininess of specular light reflection, converted to surface roughness
  • Alpha - Value in [0,1] to specify alpha value of partially transparent objects such as glass
  • Light Attenuation - Factor for computing transparency and scattering within the material
  • Index of Refraction - Value for controlling reflection and refraction in glass, plastic, etc.
  • Light Radius - Radius of light emission for light source spheres
  • Diffuse Color - {R,G,B} diffuse, emissive, or light source color value
  • Specular Color - {R,G,B} specular color value ((1,1,1)=white for non-metals)
For example, the material "Gold" is specified as:
hardness 0.8
density 19.29
alpha 1.0
reflective 1
metalness 1.0
specular_mag 1.0
specular_exp 128.0
diffuse_color 0.0 0.0 0.0
specular_color 0.9 0.6 0.1
add_material Gold


I recorded several videos showing how 3DWorld's dynamic, throw-able spheres and cubes work, including realtime editing of material parameters. I feel that videos are required to show these features. It's just too hard to tell what's going on in static images. I can only cover a small fraction of the materials, parameters, and features available in these short videos.

Sorry, none of these videos were recorded with sound. The only sounds I have enabled in these tests are for throwing and bouncing anyway. These videos are too long to record with the free version of Fraps (which has sound). The FFmpeg video recording wrapper support in 3DWorld can record unlimited length videos and compress them in realtime, but I haven't figured out how to record audio yet in Windows.

Here is a video of me throwing spheres of various materials around in the scene and editing the material parameters in realtime. Everything in the scene is reflected in mirror surfaces, including the placeholder smiley player model.


This is a video showing dynamic sphere point lights and cube mapped shadows in a dark room. Lighting, shadows, reflections, and various other parameters can be assigned to materials and edited in-game.


I later decided to add support for dynamic material cubes as well as spheres. Here is a video of me throwing some cubes around and changing their textures and normal maps. Cubes and spheres use partially elastic collision models and will propagate collision forces around when piled up on top of or against each other. They can be stacked, pushed around the scene, and the player can stand on them, though there are some issues with simulation/physics stability.


Density is one of the material parameters that can be modified in realtime through the material editor. The material's density affects the amount of resistance to pushing and its buoyancy in water. In this video, I edit the density of the brick cubes, which affects how high they float in the water or how quickly they sink. The player can stand on and stack objects on the cubes as well, and everything works correctly. Spheres can also be used.


This is a video of my incomplete puzzle/platformer scene. It uses a variety of different effects and materials. The marble floor and some of the glass surfaces are plane reflectors. I haven't finished all of the traps and obstacles, and the various sections aren't even fully connected. I had to use the "flight mode" cheat to get to the second section. I'll post more screenshots and videos of this map later when it nears completion.


 I'm continuing to work on dynamic objects and materials. I would like to add support for the other shape types supported by 3DWorld: polygon, extruded polygon, cylinder, cone, capsule, and torus. I'm also considering adding more physics properties to the editable materials list, for example parameters for friction, air resistance, deformation, elasticity, player damage, etc. Regular dynamic 3DWorld objects such as weapon projectiles and pickup items use fixed materials, which already have all of these properties. Finally, I would like to add a way to make these objects into efficient static scene objects so that this mode acts like an in-game scene/map editor. I'm curious to see what the performance is when there are thousands of placed objects of dozens of different materials in the scene.

Wednesday, September 14, 2016

Reflections and Roughness

This post continues my work on cube map reflections from where I left off in an earlier post on this topic. I had it working pretty well at the time. However, I was never able to get 100 reflective objects in the scene in realtime because I didn't have enough GPU memory on my 2GB card. I now have a GeForce GTX 1070 with 8GB of video memory, which should allow me to add as many as 300 reflective objects.

Another problem that I had with the earlier reflection framework was the lack of surface roughness support. Every object was a perfect mirror reflector. I did some experiments with mipmap biasing to try and get a proper rough surface (such as brushed metal), but I never got it working at a reasonable performance and quality point. I think I've finally solved this one, as I'll explain below.

3DWorld uses a Phong specular exponent (shininess factor) lighting model because of its simplicity. Physically based rendering incorporates more complex and accurate lighting models, which often include a factor for surface roughness. I'm converting shininess to surface roughness by mapping the specular exponent to a texture filter/mipmap level, which determines which power-of-two sampling window to use to compute each blurred output texel. I use an equation I found online for the conversion:
filter_level = log2(texture_size*sqrt(3)) - 0.5*log2(shininess + 1.0)

The problem with using lower mipmap levels to perform the down-sampling/blurring of the reflection texture is the poor quality of the filtering. Mipmaps use a recursive 2x2 pixel box filter, which produces blocky artifacts in the reflection as seen in the following screenshot. Here the filter_level is equal to 5, which means that each pixel is an average of 2^5 x 2^5 = 32*32 source texels. Click on the image to zoom in, and look closely at the reflection of the smiley in the closest sphere.

Rough reflection using mipmap level 5 (32x32 pixel box filter) with blocky artifacts.

The reflection would look much better with a higher order filter, such as a bi-cubic filter. Unfortunately, there is no GPU texture hardware support for higher order filtering. Only linear filtering is available. Adding bi-cubic texture filtering is possible through shaders, but is complex and would make the rendering time increase significantly.

An alternative approach is to do the filtering directly in the fragment shader when rendering the reflective surface, by performing many texture samples within a window. This is more of a brute force approach. Each sample is offset to access a square area around the target pixel. I use an NxN tap Gaussian weighted blur filter, where:
N = 2^(filter_level+1) - 1
A non-blurred perfect mirror reflection with filter_level=0 has a single sample computed as N = 2^(0+1)-1 = 1. [Technically, a single filter sample still linearly interpolates between 4 adjacent texels using the hardware interpolation unit.] A filter_level=5 Gaussian kernel has N= 2^(5+1)-1 = 63 samples in each dimension, for 3969 samples total. That's a lot of texture samples! It really kills performance, dropping the framerate from 220 FPS to only 19 FPS as shown in the screenshot below. Note the framerate in the lower left corner of the image. But the results look great!

Rough reflection using a 63x63 hardware texture filter kernel taking 3969 texture samples and running at only 19 FPS.

The takeaway is that mipmaps are fast but produce poor visual results, and shader texture filtering is slow but produces good visual results. So what do we do? I chose to combine the two approaches: select a middle mipmap level, and filter it using a small kernel. This has a fraction of the texture lookups/runtime cost, but produces results that are almost as high quality as the full filtering approach. For a filter_level of 5, I split this into a mipmap_filter_level of 2 and a shader_filter_level of 3. The mipmap filtering is applied first with a 2^2 x 2^2 = 4x4 pixel mipmap. Then the shader filtering is applied with a kernel size N= 2^(3+1)-1 = 15. The total number of texture samples is 15x15 = 225, which is nearly 18x fewer texture accesses. This gets the frame rate back up to around 220 FPS.

I'm not sure exactly why it's as fast as a 1x1 filter. The texture reads from the level 2 mipmap data are likely faster due to better GPU cache coherency between the threads. That would make sense if the filtering was texture memory bandwidth limited. I assume the frame rate is limited by something else for this scene + view, maybe by the CPU or other shader code.

Here is what the final image looks like. It's almost identical in quality to the 63x63 filter kernel image above. The amount of blur is slightly different due to the inexactness of the filter_level math (it's integer, not floating-point, so there are rounding issues). Other than that, the results are perfectly acceptable. Also, this image uses different blur values for the other spheres to the right, so concentrate on the closest sphere on the left for comparison with the previous two images.

Rough reflection using a combination of mipmap level 2 and a 15x15 texture filter kernel taking 225 texture samples.

Here is a view of 8 metal spheres of varying roughness, from matte (fully diffuse lighting) on the left to mirror reflective (fully specular lighting) on the right. Each sphere is one filter_level different from the one next to it; the specular shininess factor increases by 2x from left to right.

Reflective metal spheres of varying roughness with roughest on the left and mirror smooth on the right.

This screenshot shows a closer view of the rough sphere on the left, with the filter_level/specular exponent biased a bit differently to get a clearer reflection. There are no significant filtering artifacts even at this extreme blurring level.

Smiley reflection in rough metal sphere showing high quality blur.

I'm pretty happy with these results, and the solution is relatively simple. The next step is to make the materials editable by the user and to make the reflective shapes dynamic so that they can be moved around the level. In fact, I've already done this, but I'll have to show it in a later post.

Sunday, August 28, 2016

Procedural Universe Rendering

I recently installed a new version of Microsoft Visual Studio on my home machine where I develop 3DWorld. The upgrade from MSVS 2010 to MSVS 2015 had been delayed until I found a good deal on the latest version, which sells for hundreds of dollars. I managed to get a used copy on amazon.com for a fraction of the retail price, and it seems to work just fine.

Overall it only took me a few hours to get 3DWorld building and running with the new compiler. There were various minor fixes for syntax errors and warnings, and I had to rebuild some of the dependencies. However, the upgrade did require me to spend a lot of time setting up my universe mode scenes, for about the fourth time in the history of 3DWorld. My universe scenes were all invalidated and had to be reconstructed because the planets were different types and in different places. I had to re-place the ships and space stations and change various parameters.

The problem is that that the built-in random number generator values changed again. It seems like every version of Visual Studio gives me different values from rand(). Normally I wouldn't use the system rand() because it's slow, poor quality, and varies across compilers/OSes. I have my own custom random number generator that solves these three issues that I've been using since I switched to MSVS 2010 about 5 years ago. I thought that was the last time I would have to deal with the universe random seeds problem. I guess not.

Unfortunately, I missed a call to rand() that was used to precompute a table of Gaussian distribution random numbers to avoid generating Gaussian distributions on the fly. My custom random number generator was still being used to select a random entry from the Gaussian table, but the entries were all different. This distribution was used to select the temperature and radius of each system star. The star radius affected the planet orbits and the star temperature affected the planet types and environments. All of the galaxies and systems locations were the same, but within a solar system everything was different.

I fixed the problem and added a random seed config file parameter. This made it easy to regenerate the current system until I found one I liked, rather than having to fly around the galaxy looking for a suitable starting system for the player. I was looking for a seed that would give me a yellow to white star, an asteroid belt, and at least one of each type of interesting planet (Terran/Earth-like inhabitable, gas giant, ice planet, volcanic planet, ringed planet, etc.) In the process I came across some interesting and beautiful planets such as the gas giant in the screenshot below that looks like Jupiter.

Closeup of a procedural gas giant that looks like Jupiter, including small elliptical "storms".

Shadows

I settled on a system that had some interesting shadow effects, so I thought I would take some screenshots of the different types of objects that cast and receive shadows. Here is an image of a planet with a moon that is in the middle of the system's asteroid belt. I don't know if this actually happens in real solar systems, but it certainly makes for interesting gameplay. It's fun to watch the ships fly around the planet trying to (or failing to) avoid colliding with the asteroids. In this screenshot, I've positioned my ship so that the star is behind me and I'm in the shadow of the moon, looking at the asteroid belt and the planet, which is right in the middle of the asteroids.

Moon and planet casting shadows on an asteroid belt.

The small asteroids in the near field are fully shadowed and black, and the asteroids further away show a dark cone of shadow extending toward the planet in the center of the image. Some of the shadowed asteroids are difficult to see because they blend in with the black universe background, but you can definitely see shadowed asteroids contrasted against the planet. The shadow cone eventually disappears as the moon occludes a decreasing amount of the light from the star as the distance from the asteroid to the moon increases. This is similar to how, on Earth, shadows from nearby objects are much sharper than shadows from distant objects. Also note that the moon doesn't actually cast a shadow on the planet in its current position.

Here is a nice blue ocean planet that has a ring of asteroids around it. The ring casts a thin shadow near the equator of the planet. This can be seen as a thin dark line a bit below the center of the planet. This shadow is ray traced through the procedural ring density function in the fragment shader on the GPU to determine the amount of light that is blocked. The sun is behind my ship and a bit to the right. You can also see that the planet shadows the asteroid belt on the back left side. I found another planet where the moon should cast shadows on the rings, so I'll have to implement that in the code next.

Beautiful blue planet with asteroid belt rings. The rings cast a faint shadow on the planet and the planet casts a soft shadow on the rings.

I was lucky enough to find a rare occurrence of a moon casting a shadow on a planet - a solar eclipse! However, the relative sizes and distances between the star, moon, and planet in 3DWorld aren't to scale with real distances, so it may not represent a physically correct eclipse. I don't see these very often, and the previous planet configuration (MSVS 2010) didn't have one of these in any nearby star systems. The moon slowly revolves around the planet with an orbital period of around an hour, and after a few minutes of time the shadow no longer intersects the planet.

Rare occurrence of a moon casting a soft analytical shadow on a planet. The planet also reflects light onto the moon.

Note that the shadow has a physically correct umbra and penumbra. This is computed in the fragment shader when rendering the planet. The amount of light reaching the planet is calculated as one minus the fraction of the sun disk that is occluded by the moon. The sun is modeled as a circular/disk light source and the moon is modeled as a sphere projecting into a circle along the light vector. You can find the math for such a calculation here.

Bonus video of asteroid bowling! Here is a video of a planet plowing through the asteroid field at 100x speed, with a moon trailing behind it. I fixed the asteroid belt placement after recording this video.



Nebulae

Nebula rendering is not new to 3DWorld. I've shown images of 3DWorld's nebulae in previous posts such as this one. I recently went back and reworked the shader code that determines the color and transparency of each pixel in the nebula. I made a total of three changes:
  1. Added an octave of low frequency 3D Perlin noise to modulate the density/transparency of the nebula to give it a more random, nonuniform shape rather than looking like a large sphere.
  2. Increased the exponent of the noise from 2.0 to a per-nebula random value between 2.0 and 4.0 to produce stronger contrast between light and dark areas (wispy fingers).
  3. Switched to additive blending to model emissive gas rather than colored occluding material for high noise exponent nebulae to give them a brighter appearance.
Here are some nebula screenshots. They show the evolution of nebula rendering as I applied my changes to the algorithm. The first two show the original algorithm, the middle two show changes 1 and 2, and the last four images show the final code. The stars in these screenshots are in front of, inside, and behind the nebula.











Keep in mind that nebulae are volumetric objects computed using 3D noise, not just 2D images. They are drawn with 13 crossed billboards, allowing the player to fly in and around them with minimal rendering artifacts. I got the idea from this video.

That's it for nebulae. I'll add some more images if I change the algorithm again in the future. Sorry, I haven't created any nebula videos. The fine color gradients just look horrible after video compression, and it ruins the wispy, transparent effect.

Saturday, August 6, 2016

Indirect Lighting for Dynamic Objects

This is a followup post to my indirect lighting post of last year. I decided that I wanted moving objects such as doors to also influence indirect lighting in the scene. This is more difficult than handling light sources that can be switched on and off by the player. Moving objects have more than on/open and off/closed states - they have all the intermediate positions representing partially open states. Storing only two states isn't enough, and linearly interpolating between them doesn't work well for all cases. The light moves with the object. Consider a moving object that starts entirely to the left side of an opening through which light can pass, then moves entirely to the right. At both extremes it blocks no light, but at the midpoint of its path it blocks the entire opening, resulting in a dark room. This condition can't be achieved by interpolating between the end points, which would both be at the same lighting solution (fully lit).

These types of moving objects are called platforms in 3DWorld. They're named after the platforms used for doors and elevators in the Forge map editor for Marathon, a game I played in college long ago. 3DWorld platforms can move in any direction, and can be used for doors, elevators, crushers, machines, etc. Custom triggers can be attached to platforms to control them. These triggers can be activated by the player, or can be proximity sensors triggered by the player or smiley AIs. The example door shown in the images and video below are activated by four player controlled switches placed on the walls by the door. I even made the switches an emissive yellow color so that they can easily be seen in the dark.

Back to lighting. I briefly considered storing precomputed lighting values for several intermediate points along the platform's motion. There are some problems with this approach. One issue is that a small number of precomputed points doesn't provide a very accurate interpolation across the lighting values as the platform moves. A large number of points takes too much CPU time to compute and too much disk space to store. Also, the number of blocks of saved lighting data increases exponentially as multiple interacting platforms are added. For example, if the scene contains two adjacent doors A and B, they may interact with each other. Door A might block most of the light reaching door B. If they're both in series along the same hallway, light won't reach the end of the hallway unless both doors are open. This is difficult to automatically detect just by looking at the geometry of the doors and the hallway. We instead need to store a minimum of four lighting states: {A and B closed, A and B open, A open B closed, A closed B open}. If there are three doors, we need 8 states. It quickly gets out of control as the data scales exponentially with the number of doors/platforms.

This problem is similar to the one discussed at the end of this blog post for the game "The Witness". I remember reading about the exponential combination problem on their blog somewhere, but I can't seem to find it now. However, their indirect lighting system is entirely different from the one used in 3DWorld, so the trade-offs are also somewhat different.

My second idea was to cache the rays intersecting any possible position of each platform, and sort out which rays are blocked at runtime, based on the current door position(s). The platform is expanded to cover the union of it's possible positions by extending it in a line between it's start and end points. This proxy object is added to the bounding volume hierarchy prior to ray tracing. Then, when computing indirect lighting, any ray that could hit the platform in any of its possible positions will hit this proxy geometry. All rays intersecting the proxy are terminated (no longer propagate) and stored in a file on disk. This process is only done once, after which the file is loaded and its data reused. At the end, the proxy is removed and replaced with the actual platform in its initial position. All saved rays are re-cast, and any rays not intersecting the platform position add reflected indirect light to the scene. This additional light "L" represents the initial/nominal lighting of the scene, and is saved to the precomputed indirect lighting file for future use.

When the platform moves, the rays need to be re-evaluated to determine which ones are blocked by the platform in its updated position. The simplest approach is to remove the contribution of "L" from the scene and recompute it using the new platform position. While this works, and is simple, it's not a very good solution. Every ray would need to be re-cast every frame the platform is moving. This kills the frame rate, and makes the game unplayable. Clearly, an incremental approach is needed.

The key observation is that the platform moves slowly relative to each game frame and lighting changes incrementally. A door doesn't open or close in a single frame. If it takes one second to move across its path, and the game is running at 60 FPS (Frames Per Second), we can spread the lighting update across all 60 frames to get a nice smooth framerate. The trick is to determine which rays change state from blocked to unblocked between the previous and current frames. This can be done by testing each saved ray against the platform's bounding volume, which is very easy to parallelize across multiple threads. In most cases, the vast majority of rays are either blocked or unblocked in both frames. Only a small fraction of rays will change state, and only these rays need to be re-cast to update the lighting values.

[Note that I'm ignoring rays that intersect the platform at different points in the previous and current frames, even though the reflected lighting will change. In practice the error introduced by this is insignificant compared to the magnitude of the transmitted rays, especially if the platform is a dark, non-reflective color. I'm also ignoring rays that reflect off the same platform multiple times, as again their contribution to the full lighting solution should be negligible. Light rays lose their energy quickly when reflecting off multiple diffuse objects.]

Rays that were previously blocked but become unblocked this frame can be transmitted through the scene, and recursively reflected off other objects as they are in the precomputed ray tracing phase. If the same random seeds are used as in the precomputation phase, the rays will be exactly the same, and the lighting will look as if these rays were never blocked in the first place. Any rays that newly become blocked have their weights/colors negated so that they remove light from the scene during ray tracing. The platform is temporarily removed from the bounding volume hierarchy, and ray tracing proceeds as usual with the negative rays. This will cancel out the light that was added when these rays were included in the lighting solution earlier. When the platform moves back to its original position, everything happens in reverse, where all rays have weights negated from what they were in the forward motion of the platform. Therefore, the lighting solution will converge to the original/nominal value once the platform comes to rest. In reality there is a small amount of floating-point error, and maybe some non-determinism from using multiple threads without locking or atomic operations. But, after dozens of door open/close cycles, I can't see any visual difference in the lighting.

Okay, that's enough text. How about some images? I don't really have anything too exciting to show this time. Here is a screenshot of the basement, with the basement door open. The only light source is the sky and indirect sunlight coming in through the door. Sorry the image is so dark. The door is very small compared to the enormous room, so it doesn't get very bright in here. At least it's realistic lighting for such a room.

Basement with door open, letting the outside indirect light in.

And here is the same viewpoint with the basement door closed.

Basement with door closed, blocking most of the outside indirect light. A small amount of light is leaking from the door.

The basement should be completely black, except for the tiny emissive yellow door switches. The small amount of leaked light on the right side of the door is due to the way the 3D light volume texture is sampled in the fragment shader. Lighting is linearly interpolated across voxels (3D texture pixels), which produces a smooth transition from light to dark along thin objects such as the door. Since the walls are at least one light voxel in width, they properly block all of the light.

Here is a view from the outside looking into the basement, with the door in the process of closing. The basement is partially lit in this case, where the right side of the basement is slightly brighter than the left side because the door is open on the right.

Closeup of the basement door half-way closed, seen from the outside looking in.

It's easier to see the smooth transition in a video. Lighting is updated incrementally each frame the door is moving. As long as the door moves slowly enough, only a small number of rays need to be recomputed per frame. Lighting updates have a minimal impact on frame rate. This particular door has a total of 96K intersecting light rays and moves over the course of 1.6 seconds, taking an average of only 1.3ms of realtime with 8 threads across 4 CPU cores (0.9ms for ray tracing and 0.4ms for GPU texture update).




I'll hopefully add some more dynamic lighting platforms later, once I get the system properly tuned. This same solution should be general enough that it works for a wide variety of platforms.

The next step is to make this system work with fixed position static light sources such as room lights. It would be interesting to see a closet light that can be turned on and off, so that when the closet door is open and the light is on it indirectly lights the adjacent room. After that, I could try to make this work with dynamic point light sources, such as explosion effects. Of course, I haven't even gotten the regular static indirect lighting working in this case, so it could take significant effort.

Friday, July 15, 2016

Asteroid Belts and Planet Clouds

I added asteroid belts to 3DWorld a while back, maybe a year or so ago. I thought they looked pretty good at the time. Recently I watched an asteroid belt video from the Kickstarter campaign of the Infinity: Battlescape space game. Sorry, I can't seem to find the original video, but here is a similar video by the same team/company. I realized that my asteroid belts lacked the fine reflective particle clouds that add a sense of volume to the scene. I decided to reuse the same procedural volumetric fog/cloud framework that was used for nebulae, explosions, and clouds in 3DWorld.

That wasn't my first approach. I originally wanted to ray cast into the asteroid belt volume and perform ray marching through it, integrating a randomly generated density field along the way. This is similar to how volumetric fog was done in 3DWorld as shown in this previous post. This technique works well when your scene is a large cube, but unfortunately isn't so easy when the domain is a complex shape such as a circular asteroid belt.


Asteroid Belt Bounding Volume

Let me take a step back and explain how asteroid belts are created in 3DWorld, in particular how their shape and asteroid distribution is chosen. There are two types of 3DWorld asteroid belts: system asteroid belts and planetary ring asteroid belts. 3DWorld also has spherical asteroid fields, but those work differently and won't be discussed here. System asteroid belts orbit the star in a solar system, similar to the orbits of planets. Planetary asteroid belts surround a single planet and are much smaller in size (radius) and asteroid count.

Each asteroid belt is generated by placing asteroids in a Gaussian distribution around an elliptical path in the orbital plane of the system or planet. The entire set of asteroids is contained within a non-uniformly scaled torus volume. The orbital plane normal forms the z (center) axis of the torus, like an axle through a tire. The  z-scale is typically set to around 25% of the x/y scales to produce a flattened shape that resembles a thin disk. The x and y scales can be different, producing a non-circular (ellipsoid) shape. There is also an inner radius and outer radius for the torus.

This is not a very nice shape to work with, since it is mathematically fairly complex and involves higher order trigonometric functions. It's much easier to perform a series of transforms to convert the asteroid belt shape into a unit (normalized) torus using the following steps:
  1. Translate the torus by the asteroid belt center to put the origin at (0,0,0)
  2. Rotate the torus so that its axis is oriented in the +z direction
  3. Scale the torus independently in x, y, and z to produce a circular shape with an inner radius of 1
It's fairly easy to determine if a point is inside of the asteroid belt volume by applying these transforms and checking for point-in-torus. Sphere intersection is more complex, but not too bad. However, computing the intersection points of a line with a torus is much more difficult because it requires solving for the roots of a quartic equation. There are up to four intersection points. I managed to find some existing source code to do this, but it's not something I would have wanted to derive, write, and debug myself. If you must know, parts of the source code can be found here and here. If you're really interested in the math, here is a fun paper that's guaranteed to keep you busy for a while or put you to sleep. If anyone knows of a simpler way to compute the intersection of a line with a torus, please let me know. Bonus points if it works both on the CPU and on the GPU.

Anyway, if you remember from the second paragraph, the volume ray marching approach requires computing all intersection points of a line with the asteroid belt bounding volume on the GPU. This would require porting the transform code, quartic solver code, and torus intersection code to GLSL. Now, I'm sure this would be possible, but it would be a huge time sink to debug, and there may be floating-point precision issues if it was all done on the shader in single precision. And it would probably be very slow. I decided to abandon that approach and go with something different and (hopefully) easier.


Asteroid Placement

Let me explain how asteroids are actually placed and rendered to produce a realistic volume consisting of millions of asteroids of various sizes. There are three types of asteroids drawn:
  1. Large asteroids drawn as procedurally generated 3D triangle meshes; Up to 10,000 instances of 100 uniquely generated asteroids; They dynamically move and rotate over time.
  2. Smaller asteroid point sprites that are rendered as spheres in the fragment shader; 1M generated, though only nearby asteroids are visible
  3. Smaller asteroids drawn as points to fill in the gaps when the player is near or within the belt; ~100K points per few degree arc slice of visible nearby torus (~1M max visible)
This set of asteroids fill in the space in the belt fairly well. Type 1 asteroids are highly detailed and also include normal mapping and procedural craters to make them look more convincing. The smaller points and spheres are thrown in as part of the background to trick the user into thinking there are millions of large, detailed asteroids out there. It works!


Asteroid Belt Screenshots

Here is a screenshot of what these three types of asteroids look like together. This is just a small section of the asteroid belt, maybe 1-2% of the total. How many asteroids does it look like this section contains?

Closeup view of an asteroid in the asteroid belt showing normal mapped craters and a nebula in the background.

Pretty good, but the density is not as high as the asteroids in the original Infinity video. Something needs to be added to fill the spaces between the meshes, spheres, and points. How about some procedural, reflective dust clouds?

Asteroid belt with procedural volumetric dust clouds reflecting the star's light, shaded with the star's color.

This looks much better. The asteroid belt has more volume and looks more interesting. The dust clouds properly occlude asteroids that lie behind them and produce a sort of fog in the distance. The dust is a yellowish color based on the star's color. Now it looks like there really are millions of asteroids visible, from huge ones to tiny bits of dust. Of course it's another trick, there are only a few thousand of them. Here is another view of this system asteroid belt, from outside looking in toward the star.

Asteroid belt and reflective dust clouds viewed from the outside facing the yellow star.

Note that the dust clouds extend further outside the torus envelope than the asteroids themselves. This seems to make sense physically: Larger, heavier asteroids are affected more by gravity, making them revolve around the star or planet faster, and forcing them into a thinner ring in the orbital plane. At least it may be correct to first order.

Here is an example of a cold, icy planetary asteroid belt, viewed from slightly above.

Cold planet with surrounding asteroid belt containing ice crystals and reflective dust.

The clouds seem to gently rise up out of the orbital plane with very slow animated motion. The star is behind and below the camera, causing the asteroid belt to cast ring-shaped shadows on the top part of the planet.

Here is a video of my ship flying into the system asteroid belt, bouncing off two large asteroids (collision detection is enabled), then flying to a ringed planet and crossing through its asteroid belt.



Rendering - How It's Done

There are 100 unique dust cloud models generated when the first asteroid belt becomes visible, and they're shared across multiple belts. The vertex data is stored in GPU memory for fast access for drawing. Limiting the number of unique clouds cuts down on CPU time and GPU memory. Each large type 1 asteroid has a dust cloud instance attached to it with a small random translational offset to make cloud placement look more random. Clouds are attached to asteroids so that they move with them, without having to independently compute orbital vectors for yet another type of object on the CPU. This way, there is no explicit physics update for dust clouds. They need to move to track a planet that revolves around the star. The movement also adds more dynamic effects to the rendering, which makes it more interesting. As a bonus, cloud positions don't need to be generated within the asteroid belt bounding torus as they inherit that property (approximately) from the asteroids they're attached to. This makes the code much simpler.

Each cloud model consists of 9 intersecting quad billboards that cover an approximately equally spaced set of normal vectors on the unit sphere. This is fewer than the 13 billboards used for nebulae and explosions, for a different quality vs. performance trade-off. The various billboards are faded in and out by modifying their transparency (alpha) values based on view distance and view angle. Distant clouds are faded to transparent and skipped to improve rendering time, since they don't contribute much to the final image. Clouds very close to the player/camera are also faded out to reduce the amount of fragment shader overdraw and minimize worst case framerate.

The GPU fragment shader computes per-pixel transparency by evaluating 4 octaves of 3D Perlin noise, where each octave is implemented as a lookup into a 3D precomputed noise texture. I used 4 octaves rather than the 5 octaves used for nebulae and explosions to improve performance. Since 9 billboards are used, a few of them are oriented toward the camera for every possible camera position, producing an illusion of a 3D volume with simulated parallax. Asteroid clouds have the most impact on performance when the camera is in the middle of the belt and the fragment shader must do significant work computing noise values. On average, enabling clouds reduces the framerate of this case from 260FPS to 140FPS, which is reasonable.

I chose a fairly simple lighting model for asteroid belt clouds. I assume the clouds are composed of small particles that reflect the star's light in all directions like tiny, randomly oriented mirrors. No explicit light scattering is modeled (yet). I also assume that the occlusion of the particles themselves is negligible, so that unlit/shadowed particles are effectively invisible. This is similar to how dust that is normally invisible will shine when it's caught in a path of sunlight in a room. With this approach, the lighting is independent of the camera view direction and the star's light direction, which simplifies the math and makes the shader faster. The CPU can simply intersect each dust cloud's light ray with the nearby planets and moons to determine which are in shadow. Shadowed clouds are simply not drawn, since they contribute no light and are assumed to produce no significant occlusion. The end result is that non-shadowed clouds are lit using the star's color and an intensity based on the distance to the star with a quadratic falloff.

Partially transparent surfaces are typically drawn in back-to-front depth order so that alpha blending works properly. However, sorting thousands of clouds by depth on the CPU is an expensive process. The sort would need to be performed every frame as both the camera and the clouds are moving. I decided to omit this sort, since it works well enough without it. Each cloud is around 95% transparent, so the depth sorting errors are barely noticeable, especially with alpha testing enabled.

Update: I added the sort, after filtering by distance and view frustum culling. It only seems to add around 1% additional render time. It makes very little difference in the final image, so it's probably not necessary. For reference, the culling, sorting, and the rest of the CPU side of rendering only takes around 0.3ms.


Planets

This post so far has been a wall of text, but not too many fancy pictures/videos. Here are some bonus screenshots of planets showing off planetary clouds and other effects.

View from moon orbit showing a highly detailed moon surface with normal maps and GPU tessellation, a Terran planet behind it, and an asteroid belt in the distance.

This first image shows a closeup of a moon's surface. The high resolution procedural normal map is generated from height differences in the fragment shader. The moon's horizon is also very detailed thanks to the tessellation shader that converts a low polygon sphere into a detailed, bumpy planet. A procedural Terran planet and some large procedural voxel asteroids are visible in the background. [The asteroids are probably unreasonably large.] Behind them are the system asteroid belt, and behind that is an ice and rock planet. You can even see a few space ships floating around by the moon and near planet.


Water/ocean planet and moon. The planet is covered with clouds that cast shadows on the water.

This image shows a cloudy ocean planet in a solar system with a yellow-orange sun. The clouds are procedurally animated in the fragment shader and cast shadows on the water under them. A moon is shown to the left, with a volumetric nebula behind it. This nebula uses a rendering approach similar to the asteroid belt clouds, but it uses three color channels rather than an alpha channel only. In addition, the nebula uses ridged noise and different noise constants.