3D Game Engine Design

3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics by David H. Eberly was an quite a read, at slightly over 1,000 pages. However, after a few chapters in I was already getting fatigued and I really had to push my way to the end. That’s not to say that the book was bad, it was not, however it was nothing like what I expected. Let me explain.

Imagine you walk into a restaurant and sit down with a friend. After several minutes of thought, you decide to order a steak. The waiter comes, takes your order, and about 15 minutes later returns with a plate. Except when you go to eat, you realize he has brought you a grilled chicken instead. It’s not that grilled chicken tastes bad, it’s just not what you ordered. I feel the same way about this book. The title says: 3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics, however there is very little to no design in the book, it’s not very practical, and there is not much coverage of computer graphics itself.

But at 1,000+ pages there must be some information in there, and indeed there is. However, it is almost 99% math. I don’t have a problem with math. What I do have a problem with is pages and pages of mathematical proofs, when an explanation of an algorithm would have sufficed. The math is just really heavy, and made even harder to follow due to formatting errors on the Kindle e-book. For example, some symbols would be replaced by squares, making them almost impossible to decipher. In addition, most of the equations and formula were images, but some were too small to read and difficult to click on. Again, making it hard to follow. For a technical book this is all but unacceptable, and makes it next to useless to base an implementation on. I found many times I was reading 2 or 3 pages into a proof and I just would forget what the formula was even calculating. It’s not that I am slow. I have read other 3D math books and had a good time. The explanations here are just somewhat lacking and dry.

The main gripe I have is that the design of a game engine is nowhere to be found. You would expect overviews of class structures, game loops, how to communicate between objects, event systems, scene graphs, encapsulation of graphical APIs, input abstraction, etc. Nope, not here. It’s not even until the end of the book, in chapter 18, that he even mentions OOP. Most of that chapter is general OOP concepts (that you expect anyone that made it that far into a book like this already knows) and at the end sub-chapter the author goes into some topics that I would consider game engine design focused.

Fine, but surely there is something to like. I will say that the coverage of certain aspects of core math of an engine were covered in-depth. Specifically, bounding volumes, collision/intersection detection, and distance testing were given good coverage. Just looking at the table of contents is deceiving because it appears that much more is covered. For example, there is a chapter on physics, yet it is only about 20 pages and is not very helpful at all. Only in the last chapter was really any graphical concepts covered and, again, it was brief and only scratched the surface.

I’m not sure what David Eberly, the author, is trying to do here. This is the second book of his I read, and I had the same complaints about that. The book was mislabeled and deceptive. Had he just titled it “Game Engine Mathematics” I would have been a lot happier. Granted, I may have purchased the book anyway, but at least I would have known what I was getting into. I wanted a design book, I purchased what I thought was a design book, and all I got were a bunch of mathematical proofs. Sorry, I am disappointed.

If you are looking to do research on 3D math then there are better, more approachable, books out there. See 3D Math Primer for Graphics and Game Development, by Fletcher Dunn and Ian Parberry or Mathematics for 3D Game Programming and Computer Graphics, by Eric Lengyel. If you want a game engine design book then Game Engine Architecture by Jason Gregory has a great overview and 3D Game Engine Programming by Stefan Zerbst is better for implementation. Honestly, there could be more books in this field. Unfortunately, 3D Game Engine Design doesn’t fill it’s own shoes.

Spent the last couple days adding in skybox support into the engine. Currently it’s a little hard-coded, but it does seem to be working well. I also bumped the field of view (FOV) up to 90 (from 45) so you can see more of the sky. I wanted to make sure I was only using my own artwork for this engine demo. Unfortunately, it would have been rather difficult to take panoramic photos myself. So I generated the sky texture using Terragen. Somewhat of a cheat, but I did technically “create” it myself, so I can live with that.

Ran into one issue that had me stumped for about a day. Basically, I was following a tutorial for implementing the skybox. It seemed like I followed everything well, but I was getting this wild fish-eye type of distortion on the sky. As it turned out, it was my custom matrix transpose function that was at fault. The tutorial used “XMMatrixTranspose()” but I re-implemented this myself. The trouble was, I was swapping around the values without creating a temporary copy first. Meaning, I was doing something like this:

m12 = m21;
m21 = m12;

Clearly that won’t work. The interesting part is that I got stuck on it one night and couldn’t figure out what was wrong. I knew it had something to with the matrix math, but I wasn’t sure what. So I left it alone and went about my day. Then, while in the bathroom washing my face, I had a “eurika!” moment and the answer came to me. Not sure how that works, if it’s like a sub-process in my mind or something cracking away at problems. But I’m glad I finally caught the error.

There are a few things I’d like to try next. First off, the lighting model could be improved great. Right now it’s just a hard-coded ambient and directional light. I’d like to have a more flexible system to add any type of light into the scene graph and have the shader support it. Support for normal mapping would be cool, as well as specular and emissive textures. I’d also like to do some initial experiments in building a physics system, or at least learn more about compute shaders and how they integrate into the pipeline. Plus, better artwork. Stay tuned.

While getting models loaded was pretty exciting, I ended up dealing with major load times on the demo. Granted, my XML parsing code is probably slow as all hell, but I don’t think COLLADA is really designed for real-time engine use. With simple plane and cube shapes the loading wasn’t that bad, but with my soda can model (around 600 triangles) the loading was nearing 10 seconds (totally unacceptable). I can only imagine what would happen with a really complex model. Something had to be done.

So I decided to switch to a binary format with basically only exactly what I needed to pump into DirectX (the vertices, normals, uvs, and indices). I created a separate console application that would covert *.DAE files into my new binary format. Then I added engine support for loading the binary file instead of COLLADA. The gain was HUGE. Now when running the exe, there was no noticeable lag time at all. I guess I kind of knew I needed to do this at some point, but the wait times were too much to bear any longer. Glad to find a good solution.

Here are some snippets of code to show how to save variables as binary data:

float someValue= 0.12345f;
ofstream outputFile;
outputFile.open(L"output.bin", ios::out | ios::binary);
outputFile.write((char*)&someValue, sizeof(float));

And then you can read this value later by doing:

float someValue;
ifstream inputFile;
inputFile.open(L"output.bin", ios::in | ios::binary);
inputFile.read((char*)&someValue, sizeof(float));

Actually not that difficult at all. The benefits are decreased loading time and also smaller file sizes. The cons are that you now have another step in the asset pipeline, and that the files are no longer human-readable. A fair trade I would say.

engine zero coke can

What you see above is a custom model I made in 3ds Max, exported as a COLLADA *.dae file, and imported into my DirectX engine. I figured I’d start with something simple, like a soda can, and I plan to make a lot more models going forward. Although I hadn’t touched Max in years, I found it to be a comfortable experience and was able to put the model together in a few hours.

Now, actually getting that model into DirectX was a different story. First off, the COLLADA documentation is vast, but they fail to explain basic things about the format. The examples they show all make sense, but with a real model it becomes more complex. To make matters worse, their forum was a ghost town and I found lots of people with the same basic questions I had that posted a thread with no replies for months (or years). That said, I was able to eventually figure it out by a lot of testing and trial and error. It really goes to show that you can build the best system in the world, but if the documentation is lacking and the community is thin, then it’s not worth jack.

To make matters worse, there was a small bug in my XML parsing code that was messing up the attributes. So some of the simple models I tried (and plane and a cube) worked, but the soda can didn’t. It ended up taking a while to track down this problem since Visual Studio was hanging if I tried to debug. It’s really scary to get to this point where you *need* the debugger desperately and it’s not there. While I thought it was crashing, it was actually just caught up in my slow parser, and when I waited for about 5 – 10 minutes it finally came back to life (and thankfully I only needed to get to that one breakpoint to see what the issue was).

Next up, I ran into some issues with the model orientation and texturing. Since 3ds Max using a Z-up coordinate system and DirectX is Y-up, this needed some special care. I would have thought the COLLADA exporter would handle this, but apparently not. The fix is to swap the Y and Z positions of each vertex. This will effect the winding order as well, so if you want your mesh to not be inside-out, you need to also change the order of the indices when you create the index buffer. For example, a triangle of “0, 1, 2” will become “0, 2, 1”. Finally I had to negate the V parameter of the UV coordinates so that the texture looked proper.

All-in-all, I am pretty happy considering I have wrote the importer basically from scratch. I would like to try some more complex models, but I will have to figure out what I want to build next. Since I am doing all this work myself, I’d like to use the engine to showcase my own artwork. I would rather not just download assets from the internet. Maybe I will build a refrigerator to put the soda in, or some more common products.

If you like what you read, post a comment and let me know how I’m doing. Cheers.

engine zero xml

Programmer art is great and all, but I’d really like to see some complex models inside the engine. Unfortunately, DirectX 11 does not include a built-in way to load in 3D models. As I’ve mentioned before, I am interested in using COLLADA has the import format. Since COLLADA is based on XML, I will need a way to load and parse XML files. While there are tons of XML parsing libraries out there, I decided to write my own. Why would I do that? A few reasons. First, I don’t want my engine to be encumbered with 3rd party licenses, forcing me to do things against my will. Secondly, I think it’s a great learning exercise to see how something like this is done. Lastly, it’s fun!

Sadly, I found very little resources on how you go about coding an XML parser from scratch. Of the code I did find (i.e. from open-source libraries), it was difficult for me to extract the algorithm from the code. There was one resource that did help somewhat, from ANTLR3, but it failed to provide the pseudo-code I was looking for. Even so, it was enough to get me started.

The basic procedure I followed was loading the XML file into a string, then iterating though the string and breaking the text up into tokens. Each of these tokens would take the string representation and a label of what the token meant. Then, in a second pass, I would look though all the tokens and parse them into a tree structure. It actually ended up being easier than I initially though, and I think I completed the whole thing in about 2 or 3 days just working a few hours in the evening. I’ll highlight some of the relevant code below.

I created a map with all the important XML syntax elements so I can break them up into tokens.

tokenMap[""] = XML_CLOSE;
tokenMap[""] = TAG_EMPTY;
tokenMap["<"] = TAG_OPEN;
tokenMap[">"] = TAG_CLOSE;
tokenMap["=\""] = ATTRIB_EQUALS;
tokenMap["\""] = ATTRIB_QUOTE;
tokenMap[" "] = WHITE_SPACE;
tokenMap["\n"] = WHITE_LINE;
tokenMap["\t"] = WHITE_TAB;

First I load the XML file using the standard C++ stream libraries.

ifstream inputFile;
inputFile.open(fileName, ifstream::in);

stringstream inputStream;

while (inputFile.good()){
	inputStream << (char)inputFile.get();

string temp = inputStream.str();
char* data = const_cast(temp.c_str());

Next, I loop through all the characters in the data stream and find the tokens.

TokenList tokenize(char* doc){
	size_t docLen = strlen(doc);

	TokenList tokens;
	TokenMap::iterator it;

	string buffer = "";

	unsigned int i;

	for (i = 0; i < docLen; i++){
		bool found = false;
		for (it = tokenMap.begin(); it != tokenMap.end(); ++it){
			int tokenLen = strlen(it->first);
			if (compare(&doc[i], it->first, tokenLen)){
				int textLen = strlen(buffer.c_str());
				if (textLen > 0){
					char* text = new char[textLen];
					strncpy_s(text, textLen + 1, buffer.c_str(), textLen);
					TokenMap token = { { text, GENERIC_TEXT } };
					buffer = "";
				char* match = new char[tokenLen];
				strncpy_s(match, tokenLen + 1, &doc[i], tokenLen);
				TokenMap token = { { match, it->second } };
				i += tokenLen - 1;
				found = true;
		if (!found)	buffer.append(&doc[i], 1);

	return tokens;

Finally I iterate through the list I just created and parse that into a node-based tree. I admit this part is a little ugly, but it seems to work so I’m OK with that. The idea is that I set the function into different states, and then parse the elements in the list differently depending on the state. For example, if I see a “<” token, then I go into attribute parsing, and then when I see a “>” I set it back to the default state. The logic is fairly simple, but there are a lot of if statements to weed though if you are trying to implement this yourself.

void parse(TokenList& list, XmlNode* parent){
	ParseType state = PARSE_ANY;
	XmlNode* node = new XmlNode();
	bool created = false;
	bool allWhite = true;
	int openTags = 0;
	string attribName = "";
	string attribValue = "";
	string valueBuffer = "";
	TokenList children;
	TokenList::iterator v;
	for (v = list.begin(); v != list.end(); ++v){
		TokenMap::iterator m;
		for (m = v->begin(); m != v->end(); ++m){
			if (state == PARSE_ANY){
				if (m->second == XML_OPEN){
					state = PARSE_XML_TYPE;
				} else if (m->second == TAG_OPEN){
					state = PARSE_TAG_NAME;
					created = true;
				} else if (m->second == ATTRIB_EQUALS){
					state = PARSE_ATTRIB_VALUE;
				} else if (m->second == GENERIC_TEXT || m->second == WHITE_SPACE || 
					m->second == WHITE_LINE || m->second == WHITE_TAB){
					valueBuffer.append(m->first, strlen(m->first));
					if (m->second == GENERIC_TEXT) allWhite = false;
				if (v + 1 == list.end()){
					if (!allWhite) parent->value = valueBuffer;
					valueBuffer = "";
			} else if (state == PARSE_TAG_NAME){
				node->name = string(m->first);
				state = PARSE_ATTRIB_NAME;
			} else if (state == PARSE_ATTRIB_NAME){
				if (m->second == WHITE_SPACE) continue;
				if (m->second == ATTRIB_QUOTE) continue;
				if (m->second == TAG_EMPTY){
					state = PARSE_ANY;
				} else if (m->second == TAG_CLOSE){
					state = PARSE_TAG_CLOSE;
					openTags = 1;
				attribName = string(m->first);
				state = PARSE_ANY;
			} else if (state == PARSE_ATTRIB_VALUE){
				attribValue.append(m->first, strlen(m->first));
				if (m->second == ATTRIB_QUOTE){
					node->attributes[attribName] = string(attribValue);
					state = PARSE_ATTRIB_NAME;
					attribValue = "";
			} else if (state == PARSE_TAG_CLOSE){
				if (m->second == TAG_OPEN) openTags++;
				if (m->second == TAG_END) openTags--;
				if (m->second == TAG_EMPTY) openTags--;
				if (openTags > 0){
				} else {
					parse(children, node);
					state = PARSE_TAG_END;
			} else if (state == PARSE_TAG_END){
				if (m->second == TAG_CLOSE){
					node = new XmlNode();
					state = PARSE_ANY;
			} else if (state == PARSE_XML_TYPE){
				if (m->second == XML_CLOSE){
					state = PARSE_ANY;
	if (created && node->name.length() > 0){
	} else {
		delete node;

All in all not nearly as bad as I expected. Granted, the algorithm could be a little less hard-coded, but it’s a fairly straight-forward implementation. I also loaded in a COLLADA *.DAE file, and I did not see any errors or problem. Within the next few days I hope to integrate this code into the engine and actually load up a 3D model. Surely there will be some hiccups, but I have faith this can be done soon.

OGRE 3D Instancing

After some more testing, it looks like OGRE is not the savior it seemed like yesterday. While the static geometry boosted frame-rates greatly, it’s only useful for, well, static objects. Meaning the models can’t move or animate. I did find another option, instancing, which initially looked promising. It allows rendering of large amounts of identical objects faster than just having them be individual. Sounds good.

The implementation seemed complex at first, but then I found the InstanceManager which simplified things a whole lot. However, after getting it working, I wasn’t as impressed with the performance. Just rendering the same 13k still cubes I was getting a little over 100 fps. Then when adding rotation animation to the cubes, the speed dropped down to around 33 fps. Certainly this is still better than the naive implementation, however still nowhere close to where I want.

To be completely upfront, my computer is not a power-house. I’m still running a Core 2 Duo @ 3GHz and GTX 470’s in SLI. Getting a little old, I know, but still can play modern games like Titanfall or whatever. Maybe I’m expecting too much, don’t know at this point. I think I will just go back to development on my engine and worry about performance optimization later. Even so, this was still an interesting investigation at least.

OGRE 3D Static Cubes

Looks like I spoke too soon. While OGRE was getting pretty slow with the naive implementation, I was able to find some code on what they call StaticGeometry, which is a system to batch together lots of similar meshes that don’t move (great for my cube example project). With this feature added, the frame rate has sky-rocketed to over 2,600 fps. Most impressive. Keep in mind a blank DirectX window on my machine will get around 3,600 fps. So getting around 2,600 with over 13,000 cubes is very nice. That still doesn’t help me with my physics simulation, since static objects won’t cut it. But it does at least give me a good benchmark as to what is possible on my development hardware.

OGRE 3D Cubes

Seeing as performance has been on my mind recently, I tweaked the core render loop a bit and saw some reasonable gains. The one thing I realized is that most of the objects in the scene are static, and don’t need their combined transformed matrices recalculated every frame. I expected to see wild improvements after caching the values. What I received was a decent 50% gain. Not monster, but certainly substantial. Now the average framerates are in the upper 200’s to lower 300’s. More acceptable but still maybe not where I want it to be.

Just as a sanity check I decided to recreate the same exact 13k cube scene in OGRE, a popular open-source 3D engine. Too my surprise, performance fell to the floor. In OGRE I was only getting between around 30-80 fps, while my custom engine was getting over 5  times the frames-per-second. So this makes me feel a whole lot better about the situation. I’d also like to do the same test in Unity and some other engines and see how they compare. As a quick test, though, I’m quite satisfied.

All things considered, I’d still like the performance of my engine to be a lot better. The reason I am even working on this is because I have an idea in mind that doesn’t seem feasible with current middleware. The core aspect is a robust physics simulation, and I expect to have tens (or hundreds) of thousands of objects animating simultaneously. Maybe the reason no one has done what I want is because current PC hardware and software is not up to the task. Maybe no one has tried. Not sure, but I want to make it happen. We’ll see soon enough.

Stress Test

I don’t have much time, so I will be brief. Basically for the past few days I have been trying to optimize the engine. With the stress test you see above (around 13K cubes) I was only getting around 200 fps. Just slightly above my target of 120 fps, and with such a simple scene I was expecting more. So I got to hacking, fully realizing that early optimization is evil… yeah, yeah. In any case, I needed to know if the engine architecture was flawed in some way, and if I was going down the wrong path. Through some crude debugging I found that my matrix multiply operation was causing the huge sink in performance. My somewhat straight-forward implementation was as follows.

Matrix4x4 Matrix4x4::multiply(const Matrix4x4& rhs){
	Matrix4x4 ret;

	ret.m11 = m11 * rhs.m11 + m12 * rhs.m21 + m13 * rhs.m31 + m14 * rhs.m41;
	ret.m12 = m11 * rhs.m12 + m12 * rhs.m22 + m13 * rhs.m32 + m14 * rhs.m42;
	ret.m13 = m11 * rhs.m13 + m12 * rhs.m23 + m13 * rhs.m33 + m14 * rhs.m43;
	ret.m14 = m11 * rhs.m14 + m12 * rhs.m24 + m13 * rhs.m34 + m14 * rhs.m44;

	ret.m21 = m21 * rhs.m11 + m22 * rhs.m21 + m23 * rhs.m31 + m24 * rhs.m41;
	ret.m22 = m21 * rhs.m12 + m22 * rhs.m22 + m23 * rhs.m32 + m24 * rhs.m42;
	ret.m23 = m21 * rhs.m13 + m22 * rhs.m23 + m23 * rhs.m33 + m24 * rhs.m43;
	ret.m24 = m21 * rhs.m14 + m22 * rhs.m24 + m23 * rhs.m34 + m24 * rhs.m44;

	ret.m31 = m31 * rhs.m11 + m32 * rhs.m21 + m33 * rhs.m31 + m34 * rhs.m41;
	ret.m32 = m31 * rhs.m12 + m32 * rhs.m22 + m33 * rhs.m32 + m34 * rhs.m42;
	ret.m33 = m31 * rhs.m13 + m32 * rhs.m23 + m33 * rhs.m33 + m34 * rhs.m43;
	ret.m34 = m31 * rhs.m14 + m32 * rhs.m24 + m33 * rhs.m34 + m34 * rhs.m44;

	ret.m41 = m41 * rhs.m11 + m42 * rhs.m21 + m43 * rhs.m31 + m44 * rhs.m41;
	ret.m42 = m41 * rhs.m12 + m42 * rhs.m22 + m43 * rhs.m32 + m44 * rhs.m42;
	ret.m43 = m41 * rhs.m13 + m42 * rhs.m23 + m43 * rhs.m33 + m44 * rhs.m43;
	ret.m44 = m41 * rhs.m14 + m42 * rhs.m24 + m43 * rhs.m34 + m44 * rhs.m44;

	return ret;

Feeling that this could be improved, I found some code on StackOverflow to do the same operation using SSE instructions. I was initially considering coding it in assembly, but this looked like a cleaner solution and a little easier to understand (though, of course, nowhere near as cool as getting “pedal to the metal” and writing assembly code). I was told this should be as fast or faster than assembly anyhow. The new function is below.

void Matrix4x4::multiplySSE(float *lhs, float *rhs, float *out) {
	__m128 row1 = _mm_load_ps(&rhs[0]);
	__m128 row2 = _mm_load_ps(&rhs[4]);
	__m128 row3 = _mm_load_ps(&rhs[8]);
	__m128 row4 = _mm_load_ps(&rhs[12]);
	for (int i = 0; i < 4; i++) {
		__m128 brod1 = _mm_set1_ps(lhs[4 * i + 0]);
		__m128 brod2 = _mm_set1_ps(lhs[4 * i + 1]);
		__m128 brod3 = _mm_set1_ps(lhs[4 * i + 2]);
		__m128 brod4 = _mm_set1_ps(lhs[4 * i + 3]);
		__m128 row = _mm_add_ps(
			_mm_mul_ps(brod1, row1),
			_mm_mul_ps(brod2, row2)),
			_mm_mul_ps(brod3, row3),
			_mm_mul_ps(brod4, row4)));
		_mm_store_ps(&out[4 * i], row);

To be honest, I was disappointed. There were some small gains, sure, but I was expecting some a serious improvement. With the same 13K cube scene, I was now getting close to 225 fps. Over a 10% improvement, it’s something, but not what I wanted. So I got it into my head that I would try the DirectXMath library, and at least do some benchmarks to see how it compared. I mean, I did really want to stick with my custom math library, but not if it meant slow performance. I did a few quick test calculations, and the speed seemed nice. Sadly the compiler probably optimized them out (keep reading).

So I spent the next few hours ripping out all of my math calls and adding in the DirectXMath classes and functions instead. Finally I got to a point of having some visuals on screen. What do I see? Slow frame rates. It was much worse than before. Barely even 100 fps. Unacceptable. I fixed up some more of the code and got it to an OK state. Even then, it was only running at around 200, or about the 10% worse than my own custom functions. How could this be? Well, in some sense I feel proud that my my implementation fared well. But on the other hand, I just wasted an entire night for nothing. You live and you learn.


Today I have gotten the camera system to a decent place, and made a simple free look demo. Most of the code had already been implemented, inside the vector and matrix classes, I just had to piece it together into a camera object. I also added a grid of cubes, to better see the camera working. Sadly these extra 200 cubes slowed down the performance by a good chunk. Previously I was getting around 3,500 FPS (with 3 cubes), now I’m only getting around 2,000 FPS (with around 220 cubes).

Granted the performance is bound to drop as more objects are added, but I think I can improve this a lot. At the moment, I am not doing any sort of culling, and when I get that working I feel like it would give a good boost. However, it’s probably not the highest thing on the list since I’m still getting reasonable frame-rates.

Coming up next I would like to fix the lighting system (currently it’s just a hacked on ambient/directional light) and I need to get a COLLADA model importer functioning. I’ll also have to pick up a 3D modelling program and make some better models to test with. I did try to learn Blender a bit, but I found it cumbersome. Gonna give 3DS Max a go again. Haven’t used it in many years, but I was at one point pretty comfortable with the app. I think my first model will be a soda can, as it’s something easy and recognizable. See ya next time!