OGRE 3D Cubes

Seeing as performance has been on my mind recently, I tweaked the core render loop a bit and saw some reasonable gains. The one thing I realized is that most of the objects in the scene are static, and don’t need their combined transformed matrices recalculated every frame. I expected to see wild improvements after caching the values. What I received was a decent 50% gain. Not monster, but certainly substantial. Now the average framerates are in the upper 200’s to lower 300’s. More acceptable but still maybe not where I want it to be.

Just as a sanity check I decided to recreate the same exact 13k cube scene in OGRE, a popular open-source 3D engine. Too my surprise, performance fell to the floor. In OGRE I was only getting between around 30-80 fps, while my custom engine was getting over 5  times the frames-per-second. So this makes me feel a whole lot better about the situation. I’d also like to do the same test in Unity and some other engines and see how they compare. As a quick test, though, I’m quite satisfied.

All things considered, I’d still like the performance of my engine to be a lot better. The reason I am even working on this is because I have an idea in mind that doesn’t seem feasible with current middleware. The core aspect is a robust physics simulation, and I expect to have tens (or hundreds) of thousands of objects animating simultaneously. Maybe the reason no one has done what I want is because current PC hardware and software is not up to the task. Maybe no one has tried. Not sure, but I want to make it happen. We’ll see soon enough.

Stress Test

I don’t have much time, so I will be brief. Basically for the past few days I have been trying to optimize the engine. With the stress test you see above (around 13K cubes) I was only getting around 200 fps. Just slightly above my target of 120 fps, and with such a simple scene I was expecting more. So I got to hacking, fully realizing that early optimization is evil… yeah, yeah. In any case, I needed to know if the engine architecture was flawed in some way, and if I was going down the wrong path. Through some crude debugging I found that my matrix multiply operation was causing the huge sink in performance. My somewhat straight-forward implementation was as follows.

Matrix4x4 Matrix4x4::multiply(const Matrix4x4& rhs){
	Matrix4x4 ret;

	ret.m11 = m11 * rhs.m11 + m12 * rhs.m21 + m13 * rhs.m31 + m14 * rhs.m41;
	ret.m12 = m11 * rhs.m12 + m12 * rhs.m22 + m13 * rhs.m32 + m14 * rhs.m42;
	ret.m13 = m11 * rhs.m13 + m12 * rhs.m23 + m13 * rhs.m33 + m14 * rhs.m43;
	ret.m14 = m11 * rhs.m14 + m12 * rhs.m24 + m13 * rhs.m34 + m14 * rhs.m44;

	ret.m21 = m21 * rhs.m11 + m22 * rhs.m21 + m23 * rhs.m31 + m24 * rhs.m41;
	ret.m22 = m21 * rhs.m12 + m22 * rhs.m22 + m23 * rhs.m32 + m24 * rhs.m42;
	ret.m23 = m21 * rhs.m13 + m22 * rhs.m23 + m23 * rhs.m33 + m24 * rhs.m43;
	ret.m24 = m21 * rhs.m14 + m22 * rhs.m24 + m23 * rhs.m34 + m24 * rhs.m44;

	ret.m31 = m31 * rhs.m11 + m32 * rhs.m21 + m33 * rhs.m31 + m34 * rhs.m41;
	ret.m32 = m31 * rhs.m12 + m32 * rhs.m22 + m33 * rhs.m32 + m34 * rhs.m42;
	ret.m33 = m31 * rhs.m13 + m32 * rhs.m23 + m33 * rhs.m33 + m34 * rhs.m43;
	ret.m34 = m31 * rhs.m14 + m32 * rhs.m24 + m33 * rhs.m34 + m34 * rhs.m44;

	ret.m41 = m41 * rhs.m11 + m42 * rhs.m21 + m43 * rhs.m31 + m44 * rhs.m41;
	ret.m42 = m41 * rhs.m12 + m42 * rhs.m22 + m43 * rhs.m32 + m44 * rhs.m42;
	ret.m43 = m41 * rhs.m13 + m42 * rhs.m23 + m43 * rhs.m33 + m44 * rhs.m43;
	ret.m44 = m41 * rhs.m14 + m42 * rhs.m24 + m43 * rhs.m34 + m44 * rhs.m44;

	return ret;

Feeling that this could be improved, I found some code on StackOverflow to do the same operation using SSE instructions. I was initially considering coding it in assembly, but this looked like a cleaner solution and a little easier to understand (though, of course, nowhere near as cool as getting “pedal to the metal” and writing assembly code). I was told this should be as fast or faster than assembly anyhow. The new function is below.

void Matrix4x4::multiplySSE(float *lhs, float *rhs, float *out) {
	__m128 row1 = _mm_load_ps(&rhs[0]);
	__m128 row2 = _mm_load_ps(&rhs[4]);
	__m128 row3 = _mm_load_ps(&rhs[8]);
	__m128 row4 = _mm_load_ps(&rhs[12]);
	for (int i = 0; i < 4; i++) {
		__m128 brod1 = _mm_set1_ps(lhs[4 * i + 0]);
		__m128 brod2 = _mm_set1_ps(lhs[4 * i + 1]);
		__m128 brod3 = _mm_set1_ps(lhs[4 * i + 2]);
		__m128 brod4 = _mm_set1_ps(lhs[4 * i + 3]);
		__m128 row = _mm_add_ps(
			_mm_mul_ps(brod1, row1),
			_mm_mul_ps(brod2, row2)),
			_mm_mul_ps(brod3, row3),
			_mm_mul_ps(brod4, row4)));
		_mm_store_ps(&out[4 * i], row);

To be honest, I was disappointed. There were some small gains, sure, but I was expecting some a serious improvement. With the same 13K cube scene, I was now getting close to 225 fps. Over a 10% improvement, it’s something, but not what I wanted. So I got it into my head that I would try the DirectXMath library, and at least do some benchmarks to see how it compared. I mean, I did really want to stick with my custom math library, but not if it meant slow performance. I did a few quick test calculations, and the speed seemed nice. Sadly the compiler probably optimized them out (keep reading).

So I spent the next few hours ripping out all of my math calls and adding in the DirectXMath classes and functions instead. Finally I got to a point of having some visuals on screen. What do I see? Slow frame rates. It was much worse than before. Barely even 100 fps. Unacceptable. I fixed up some more of the code and got it to an OK state. Even then, it was only running at around 200, or about the 10% worse than my own custom functions. How could this be? Well, in some sense I feel proud that my my implementation fared well. But on the other hand, I just wasted an entire night for nothing. You live and you learn.


Today I have gotten the camera system to a decent place, and made a simple free look demo. Most of the code had already been implemented, inside the vector and matrix classes, I just had to piece it together into a camera object. I also added a grid of cubes, to better see the camera working. Sadly these extra 200 cubes slowed down the performance by a good chunk. Previously I was getting around 3,500 FPS (with 3 cubes), now I’m only getting around 2,000 FPS (with around 220 cubes).

Granted the performance is bound to drop as more objects are added, but I think I can improve this a lot. At the moment, I am not doing any sort of culling, and when I get that working I feel like it would give a good boost. However, it’s probably not the highest thing on the list since I’m still getting reasonable frame-rates.

Coming up next I would like to fix the lighting system (currently it’s just a hacked on ambient/directional light) and I need to get a COLLADA model importer functioning. I’ll also have to pick up a 3D modelling program and make some better models to test with. I did try to learn Blender a bit, but I found it cumbersome. Gonna give 3DS Max a go again. Haven’t used it in many years, but I was at one point pretty comfortable with the app. I think my first model will be a soda can, as it’s something easy and recognizable. See ya next time!

Though the above video might not seem like an overly impressive jump from the last, there’s actually a ton of work behind it. The new additions include a node-based scene graph hierarchy, more robust math libraries, and keyboard control using DirectInput. Plus, I’ve tried to abstract as much as I can into modular classes and remove the hard-coded hacks I had in there. Finally I hid away the Windows stuff into it’s own class so clients just need to create a normal main() function and can launch the window from there (removing much of the nasty Win32 looking code from sight).

Here is an example of launching an empty window with the engine:

int main(){
	Engine& engine = Engine::get();
	engine.create(1280, 720);

	while (engine.loop()){
		if (engine.control.keyPressed(KEY_ESCAPE)) break;


	return 0;

All in all a vast improvement even if the graphics aren’t too pretty yet (we’ll get there). Coming up next I want to build a 3rd person free camera to navigate around.


The demo is starting to shape up now, with texture mapping and some simple lighting (ambient and directional). To be fair, I’m not sure if I would really call this an “engine” quite yet. It’s still very much a bare-bones, hard-coded demo done in DirectX 11. But once I can get the features working, then I can properly abstract and generalize the functionality. I have plans to start implementing keyboard and mouse controls (probably using DirectInput) and then a simple camera system so I can move around in the space. I’d also like to get a model importer up and running, but I expect this to take some more time. Stay tuned.


Today I have finally gotten a simple sort of animation working. It looks easy, but I made my life a lot harder by implementing by own math library. So far I have a Vector3D and a Matrix4x4 class almost finished. Well the Vector class is pretty much done. The Matrix class still needs some fleshing out, but I got it working well enough to spin a triangle. I realize I could have used the D3DX library (which is deprecated), or XNAMath (also deprecated), or DirectXMath (safe for now), but I thought making my own math functions would be a good learning experience. I did also have to reimplement D3DXMatrixLookAtLH and D3DXMatrixPerspectiveFovLH, but Microsoft is nice enough to list the equations in their documentation (thanks!).

In addition, I got a lot of help from this book, 3D Math Primer for Graphics and Game Development, which may be one of the best I’ve seen for 3D math. While there are other books, this text has very clear explanations, equations, and actual C++ implementations. I did try my best not to “cheat” and just copy the code, and instead based it on the equations listed. Unfortunately, I got cocky and wrote the length function without checking the reference, and left out the square root function call by accident (resulting in about a hour of debugging). Eventually I figured it out, but not after checking pretty much everything else with a fine-toothed comb. No worries, it’s working now.

Feels nice to see some movement on the screen. For this week I’d like to get a textured cube spinning. Also, I realized that DirectX 11 doesn’t come with a way to import models out-of-box. So pretty soon I will probably have to write a model importer (I’m looking at COLLADA now). Sometimes I wonder if all this work is really worth it, especially with Unreal Engine 4 at $19/month at the moment. I guess I do realize I can likely never hope to compete with the big commercial engines, but I still think this is a great exercise of the mind. We’ll see how long I can keep my faith.

engine zero triangle

So I buckled down and spent the better half of the day actually getting something to show on the screen. Yes, it’s still just a triangle, but it feels pretty satisfying after struggling to get it to work all morning. While there are tons of triangle tutorials online, and it would seem like a 20 minute hack, I ended up having some difficulty for a few reasons.

One, I decided last night to upgrade to Visual Studio 2013 so I could take advantage of C++11 and the 64-bit compiler. Right off the bat, this resulted in a bunch of warnings. I eventually found the instructions to clean this up here, and it ended up not being a big deal.

The second issue was with compiling the HLSL shader (currently, it just passes along the position and color attributes without any fancy math). It turns out the D3DX11CompileFromFile is deprecated as is the whole D3DX library. I guess I could have still used them anyway, but what fun is that? So I decided to remove the D3DX dependency and either move to newer functions, or reimplement certain classes completely.

Luckily there is a function called D3DCompileFromFile, which does the heavy-lifting of compiling a HLSL shader file. However, for some reason shader files from tutorials I was following did not seem to work correctly with the new additions. After tweaking it for a bit, and a lot of trial-and-error, I was able to get something built. I know it seems rather silly to have spent hours on this, but hopefully once I get this foundation built things will become easier. Here is the (dirt simple) shader file if anyone is having similar problems.

struct VS_OUTPUT {
   float4 pos : SV_POSITION;
   float4 color : COLOR0;

VS_OUTPUT VS(float4 pos : POSITION, float4 color : COLOR) {
   VS_OUTPUT output;
   output.pos = pos;
   output.color = color;
   return output;

float4 PS(VS_OUTPUT input) : SV_Target {
   return input.color;

Next up, I want to get in texture mapping, finish the Vector and Matrix classes, and make a spinning cube. Stay tuned.

Beginning D3D11

Beginning DirectX 11 Game Programming by Allen Sherrod is, what I’d consider, a great introduction into DirectX programming. Just to be clear, it’s really only an overview of the DirectX APIs (Direct3D, DirectInput, etc.) and not really a graphics or game programming book (despite the title). So there is very little in the way of actual gameplay type programming, as you never really get to the point of having any sort of game demo. In that same respect, you don’t really deal too much with computer graphics theory, though there is some brief coverage of lighting models in regards to shader programming. That said, what is in the text is a good start to learning the DirectX 11 API and getting some foundation knowledge of the Windows platform.

The book covers basic Win32 window creation, initializing Direct3D, error handling, basic 2D graphics concepts, font rendering, input handling (with Win32, DirectInput, and XInput), fundamental 3D math (vectors, matrices, coordinate systems), cameras, and 3D models. Overall a good amount of topics, and decent coverage of the building blocks for working with the DirectX 11 SDK. While I wouldn’t say the book is for “beginners” (as nothing involving DirectX or Win32 is really for novices), it doesn’t go into as deep a depth as something like the Frank Luna book (which covers more interesting topics like normal mapping and shadow maps). However, I did find the discussion at the end about loading the OBJ 3D file format into Direct3D to be unique, as most books do not go into this.

So do I think Beginning DirectX 11 Game Programming is worth reading? Certainly. It was an approachable read and the Kindle e-book was moderately priced at around $25. For sure, if you are working with the DirectX 11 SDK you will want all the help you can get. Granted, I think some of the other titles I’ve seen had more impressive demos, or deeper coverage, but I felt this was a fine introductory text. I’d even go as far to say that you should read this book first, as it presents the basic knowledge in a way that is much more to the point and not as daunting as some other resources.

The one thing, which is both exciting and sad, is that I believe this was the last DirectX 11 book available on Amazon that I haven’t read. I see there are a few newer books covering DirectX 11.1 or later, but I’d really like to stick to straight 11 due to Windows 7 compatibility. So, at this point, I think I maybe have got as far as the introduction books will take me and I will have to just start developing with it and learning as I go. Not a bad problem to have. Although I still have a few general game engine books in my backlog, I’m feeling more confidant about getting into the trenches of development with my engine and this book has definitely helped.

I know it has been quite some time since the last update, and I thought I would be much further by now. Truth be told, I have been a little distracted the last few months with things totally tangential to graphics programming. In any case, I’ve had some motivation recently to continue on the 3D game engine and began doing further research and reading. However, research doesn’t get results on it’s own. 90% of the time, you just have to start doing whatever it is you want to do whether you think you’re ready or not. So that brings us here, to this totally underwhelming screen shot.

Engine Zero Blank

Yes, it is a blank pink screen. You can calm down. What you are not seeing is some basic boiler-plate code to initialize DirectX 11, clear the back-buffer to a plain color, and the basic Win32 message pump. I have also started to give some structure to the application, with a namespace and class that encapsulates the Direct3D related code. Right now it is a bit hard-coded for a lot of stuff, but I wanted to get it working first and then I can refine and refactor as needed. I’m actually feeling really excited about the prospects of having my own 3D engine, and hope to power through a lot of this foundation stuff so I can get some interactive demos up and running. If everything goes well, I plan to publish updates in this series more frequently as I progress. Thanks for watching.

Game Engine Design

Alan Thorn’s Game Engine Design and Implementation was quite an interesting read. Overall I thought it was good, but the book struggles at times to find it’s audience. On one hand, it covers a lot of great topics and there are some good code snippets to be found. On the other hand, it seems to jump around between APIs and frameworks and never really culminates with a complete engine. Even so, engine development is no breeze and any help in this area is much appreciated.

The text begins with the basics: downloading Visual Studio or Code::Blocks and configuring a development environment. It shows you how to create and call a DLL. Some brief coverage of the STL. All useful stuff. Then it moves on to some basic engine features, like logging errors and handling exceptions. Again a great place to start. It continues with a resource manager based on XML. Then a 2D scene manager and renderer using SDL. Supporting sound and music with the BASS library. Processing input with OIS. Then a renderer with DirectX 10. Great stuff. Then in the next chapter it throws out everything you just learned and jumps to working with OGRE. Don’t get me wrong, OGRE is a great API. But it seems strange for a book titled “Game Engine Design and Implementation” to use an off-the-shelf library and not code the, erm, implementation themselves. The book follows up with coverage of Bullet physics and ends with a brief overview of DX Studio, which is an all-in-one game engine solution.

While each chapter alone is very interesting and informative, I feel like the book as a whole lost it’s focus somewhere and the engine that you think you are creating at the beginning of the book never materializes. I almost feel bad, it’s like the author started with one premise of creating an engine from scratch, and then gives up half-way. I even agree that using pre-built tools are a good idea in many cases, and most people don’t want to re-invent SDL or OGRE or whatever. But there are other books that focus on these engines and frameworks. People picking up a book like “Game Engine Design and Implementation” probably are more interested in rolling their own engine.

That said, I still feel like the book was a worthwhile read and I did learn a little bit about some stuff and found it useful. Going in I had read the reviews on Amazon, and I knew the author was going to jump around with different libraries. Had I not known this I may have been more upset. As is, Alan Thorn is a competent writer and clearly knows a thing or two about game engines. I guess I just wish there was more of a focus on creating something cohesive and original and not just a jumble of introductions into different APIs. However, if you are on a journey (like me) of creating a 3D game engine you will need as much ammo as possible and this book certainly has a place in the arsenal. Just not the first place.