Degine: Forward+ HDR PBR IBL Render Improvements
In the past week I’ve made some major breakthroughs with the performance on Degine, my custom OpenGL / WebGL 3D engine. Did a lot of reading on performance and optimization guides, like on the ARM and Nvidia websites, specifically for OpenGL ES3 performance on mobile. Also integrated a new lighting system, that supports up to 8 dynamic lights at once (with 1 including soft shadows). This is likely the limit for mobile, though a whole scene could have numerous lights that are enabled and disabled as you move throughout the level. Initially I started with a deferred renderer, and did get it working well on desktop, but sadly the amount of textures needed ended up being too large for mobile. It would have been nicer, as it could allow for potentially unlimited lights and also makes post-processing effects like ambient occlusion easier, but it was too much. So I ended up going with a Forward+ lighting engine. This is more traditional, but segments the screen into small tiles so lighting can be calculated locally and not effect every pixel. This seems to be a much better choice for mobile and I was able to place 7 dynamic point lights (on top of the 1 directional light) with minimal loss of performance. Since the video is a bit blurry compared to how it looks rendering native on the device, here is a full resolution screen capture to more accurately judge the quality.
One other thing I added was parallax mapping. This is using the original 1-sample method, as testing layers was too many texture reads for mobile (needed for more advanced techniques like parallax occlusion mapping). It’s enabled on the brick cubes and on the concrete floor, but it’s a subtle effect and hard to see on a 1080p video. GPU texture compression is pretty important, so I finished that up too, included BC7, ASTC, and ETC2, with a fallback on JPG. This increased performance and speeds load times (as the images are already in a format the GPU can accept) though with some slight loss in fidelity. Not really noticeable, though, particularly on a small mobile screen. It does increase the package size, because the textures are larger than JPG and I also have to include several formats (some work on desktop, some on mobile, or different operating systems). I’ll need a way to dynamically load textures so that I don’t have to load everything at once that is unneeded.
Besides that I did a lot of smaller optimizations, like aggressively freeing memory after it is used, shader simplification and lower level code tricks (like using pre-computed constants or using multiplication rather than division, etc.). Overall all the little things added up. In the video it’s running on a Samsung Tab S8+ at 60FPS locked (the slight skips were due to the recording). However, when rendering without screen capture, and by tweaking the settings, I can get to a solid 120FPS (the Samsung supports 120Hz, it looks really smooth). This is also a high-end tablet, but I tested some lower-end ones and 60FPS is still possible on lower settings. At this point I will do a bit more research to see if there is anything I missed, but I’m pretty happy with the performance right now. The scene in the video is also somewhat of a stress-test, the girl alone is 100K polys, and 8 lights is pretty intensive. So in a real mobile game, the assets would likely be more modest and run better. Next up, I’m going to build out a demo level with mobile-friendly art to get a better idea of how a game will play (the assets in the video were purchased). So expect to see some new hotness soon.