The Science of Debugging

by Taylor Hadden | 17:37

Debugging is the worst part of programming. It is frustrating, demoralizing, and at the end, when you finally find the source of the bug, you more often than not come face to face with an example of your past stupidity. You can spend hours, even days, just trying to find the source of one issue. A bug so inscrutable that you know that you’re looking at the source of the problem, but for the life of you, you just don’t see it. Or worse, a bug that only happens intermittently, leaving you feeling like you are building with shifting sand.

Sound like you? Welcome to the club. A person who hasn’t smashed headlong into one of those problems is not yet a programmer. These are the issues we’ve cut our teeth on, pulling out plenty of hair along the way. In any case, I’d like to share the methodology I’ve developed for tackling debugging. Hopefully, it will help make the process a little less stressful. I’m going to be specifically referencing Unity 3D tools, but the broad methods will work with any environment.

Step 1 – Make it Repeatable

A bug that you can’t reproduce is of no use to you. If you don’t know how to cause the issue every time, you won’t be able to test it. If you can’t test it, you don’t know if you’ve fixed it. Sometimes, you’re going to have to set up a specific test case to make it happen reliably. Do it. A little prep work now will save you lots of time later. You can also consider using a Testing Framework to simulate the conditions where the bug occurs. Unity provides a good set of tools for this, though the specifics of using it is outside the scope of this post.

Step 2 – Form a Hypothesis

If you’re very familiar with your codebase, you’ll likely already have an idea of what the problem is. A null reference exception is going to point you right to the problematic line. Otherwise, it’s time to go into research mode. You need to get your program to give you information about what’s going on.

Many people like to use debuggers to step through their code line-by-line, but I do not. My most troublesome bugs are often buried inside loops; the same block of code will run 500 times before it does something wrong. My preferred tool is simply writing text to the console. In Unity, this is the Debug.Log() function. When you write a console log, be specific. You need to know the context in which the error is occurring. Posting “It done broke!” isn’t going to help you much. Include any values of interest to you, or anything that can help identify, say, what instance is causing the problem.

A window with messages from a specific object.

A window with messages from a specific object.

Often in game development, you’ll need to know a piece of information every frame. Lots of console logs would be impossible to sift through. The simplest way to display this information in Unity is by exposing a variable in the Inspector. Alternatively, you can use Unity’s GUI and GUILayout functions in a MonoBehavior’s OnGui() function to draw text to the screen. This can be a little tedious to setup and organize, so I created a custom logging window system to help me organize information on the screen.

You can also create visual debugging aids. Debug.DrawLine() and Debug.DrawRay() are both invaluable tools when trying to figure out vector math, and can be called right inline like any other Debug call. If you need more detail, using the Gizmos functions in a MonoBehavior’s OnDrawGizmos() function will let you draw simple colored primitives.

Gizmos.DrawCube() used to display ledges and their connections.

Gizmos.DrawCube() used to display ledges and their connections.

Ultimately, all of these tools are there to serve as sanity checks. They confirm whether your code is doing what you think it’s doing or not. If you can see the bug, you will have a better idea of where to look for the cause. Hopefully, that will lead you to a line of code that you’re pretty sure is wrong.

Step 3 – Test the Hypothesis

So you think you’ve found and fixed the source of the bug. Test it! You’re probably wrong. That’s okay, go to the next step.

Step 4 – Question Your Assumptions

I notice that I am confused…

Your hypothesis was wrong because you are making an incorrect assumption about some underlying function or mechanism. Computers aren’t out to get you; they only do what they are told, and us idiots are the only things telling them what to do. It’s time to go back to our tools from Step 2 and really buckle down. If an object is acting like it’s in a state that it shouldn’t be in, confirm that it actually isn’t in that state (it is) and find out what caused it to get there (that bastard). If a function is purporting to do something, don’t just trust its name or its documentation; check the results. Make sure that events that are order dependent are actually occurring in the right order.

If your change from Step 3 didn’t do what you expected it to do, that’s another avenue for discovery. If you think the code is doing something spooky, like changing what it does based on the presence of a debug statement, it’s probably not. If you’re sure that an object is in a certain state when it enters a function, and then later on in that function it’s broken, something in the preceding lines has changed it. Breakout the debug statements and figure out where. See if a function you’re calling is creating unintended side-effects somewhere else.

Step 5 – Repeat, for Science!

For a nasty bug, you’re going to be stuck in this cycle for a while. The important part is ensuring that you’re always being productive. Once you’ve found something strange, figure out why. Be downright suspicious of your own code, and don’t trust your past self. Sometimes, the worst bugs are caused by writing “transform.z” when you meant to write “transform.x” and – like an assignment you’ve been working on for too long – the typo is all but invisible to you. Remember to not just flail about changing random lines to see if they help. Be deliberate, and make changes that will answer the questions you have.

Finally

There are three general types of bugs I’ve found. Straight-up typos like the one mentioned above might make you feel like an idiot, but it’s just a (mostly) harmless mistake. Other bugs reveal a hole in your system; a use-case or scenario that you hadn’t considered that now needs to be taken care of properly. The most malicious will reveal that your best-laid plans are for naught; despite your best intentions, something about your code simply doesn’t work on a basic level.

It’s always important to remember that just because there aren’t run-time errors does not mean that every thing is fine. If a bug reveals a rotting hole in your roof, you should check the rest of the structure for similar water damage. Where else was the culprit function called? Are the error-causing side-effects actually required in those cases? Just ensuring that your code doesn’t throw any errors is tantamount to throwing some plywood over the hole and calling it a day. It is just going to make it harder to understand and fix the problem the next time something caves in. Ideally, the programs we architect are the examples of the best way to handle something. In reality, of course, that will almost never occur, but it is still an ideal to shoot for.