Friday, February 10, 2012

Caught in a web: Transitioning from native to web development

About four months ago, I joined the development team at Khan Academy. Working at Khan Academy has been amazing: every day I work with talented, motivated and intelligent people to bring innovative solutions to the nascent field of online education. However, that's not what I'm writing about today.

Before I started this job I worked in computer game development, and my programming language was C. Not C++, just plain old C. Now, I have programmed in many languages – assembly, C++, PERL, Java, you name it. But making the transition after five years writing native code for the PC to full-time web development feels like a radical shift, even though in some ways it shouldn't be. I am setting out to document these differences to serve as a guide to others making the same transition. I suspect a lot of programmers will find themselves in the same situation over the next few years as the complexity (and profitability) of web applications continues to increase.

As a programming language, C is as simple and straightforward as it gets. Compared to C++ or Java there is less structure and no built-in extensive libraries of data structures and utilities. Optimization can be done at a dizzyingly low level, with control over data structure size, method inlining and even the assembly itself if you want it. You end up writing your own dynamic arrays, allocators, and networking stacks. A seasoned programmer can save megabytes of memory by shaving a few bytes from a key structure, or speed up traversal of a tree by several orders of magnitude by making sure adjacent nodes are in the same cache line. Some programmers are so proud of their skills in these arts that they flat-out refuse to work in an interpreted, memory-managed language. Luckily, there will always be some demand for bare-metal optimization in embedded devices, real-time operating systems, etc. However, I've found that in many cases the most straightforward way to solve a performance problem in any language is simply to do less work, and that involves asking tough questions about what data you absolutely need when, whether calculation can be done in the background or deferred or on a remote machine, and how aggressively to cache results. These skills transfer very well to web development, and I haven't yet seen a case where performance suffered and there was simply no remedy. It's just a different trade-off: rather than code being optimized until it's unmaintainable, data is heavily cached, increasing the penalty for code changes if caches must be rebuilt or migrated.

One great upside of web development is the iteration cycle. A full build of a mature game can take anywhere from minutes to several hours, and most PC games take a minute or two to boot up (longer if they are built in debug mode). This means that the time between making a code change and seeing that change in a running game can be 15 minutes or more. (Anybody who points to MS Visual Studio's Edit-and-Continue feature is invited to try it on a million-optimized-file code base!) Even the most trivial change can take an hour to implement. In web development, the time between write and test is more or less the time it takes to hit Refresh in the browser. This has freed me from hours of compilation and startup time. I can't stress enough how much of a difference this has made for my productivity.

Now, when it comes to debugging, things are more of a mixed bag. The Visual Studio C debugger is very capable and has some really powerful features. I can't count all the times I set a data breakpoint and found someone misusing a variable. On the other hand, I can't count all the times a data breakpoint has helped me find a buffer overrun or someone writing to freed memory, things I never have to worry about now. In the case of JavaScript, each browser has its own debugger, all of which seem to have “borrowed” each other's features and all of which seem roughly equivalent. The availability of eval() in JavaScript is both a blessing and a curse: Blessing because I can do pretty much anything I want to the running code in the console; curse because browsers don't handle debugging dynamically generated code very well. Then there is Internet Explorer and its own very peculiar bugs. And if you're debugging server-side code in PHP or Python, I have not progressed beyond spamming to the error log. (If you know of a solution for Python in Google App Engine, please let me know!)

I'll go into more detail in future posts, but to sum up: Even though these languages are difficult to get used to if you're used to native C/C++, they do make some things easy and the development tools are constantly improving. While many PC games have updates spaced months or years apart, we are able to deploy code several times a day, sometimes several times an hour. It's a different software development mentality and one that's exciting and invigorating, because it's not about shipping perfect, elegant, bug-free code - what's important is the inspirational product we are delivering for free to students of all ages all over the world.


  1. > I've found that in many cases the most straightforward way to solve a performance problem in any language is simply to do less work.

    As the maxim goes, no code is faster than no code.

    > And if you're debugging server-side code in PHP or Python, I have not progressed beyond spamming to the error log. (If you know of a solution for Python in Google App Engine, please let me know!)

    In case you're not aware, you don't have to spam the error log; you can spam the debug/info/warning log instead because Desmond added log viewing to gae_mini_profiler:

    1. I didn't mean specifically the error log as opposed to other logs, but rather the lack of an interactive debugger that I can use to step through my code. My usual method is to step through complex code I've written even if it appears to work - many times I have found bugs in otherwise innocent-looking code when I stepped through it. Also, being able to modify a variable or kill the script when it's about to do something destructive to my local datastore would save time.

  2. What is your favorite programming language?

    1. This is a good way to start a flame war, but I'll answer anyway.

      I'm a really big fan for using the right language for the right task, so I don't believe in the "best" language for everything. The language I would say is the best suited for a large range of tasks and a large programming team is Python. It has very light-weight modules, a strong set of libraries, and great platform support. I would prefer a bit more type-safety, but not quite as much as Java. I like Java, but it's horribly verbose and takes longer to get any code written than just about any language I know.