Neatness Counts
By David Schneider
Messy computer code is normally frowned on—but not always
Messy computer code is normally frowned on—but not always
DOI: 10.1511/2007.65.213
Budding programmers have long been taught the value of writing computer code that is easy for anyone to follow, thus allowing modifications, upgrades and bug fixes to be made easily. One common technique is to intersperse comments to describe various elements of the program. Another useful strategy is to assign names to variables, methods and subroutines that reflect their function. For example, instead of writing the somewhat cryptic "For I = 1 to N," a responsible programmer might use "For file_index = 1 to number_of_files." Such changes don't, of course, affect how the software works, but they are nevertheless considered important because they make a sequence of computer instructions to some extent self-documenting.
Recently, programmers have also been learning the value of making their code less readable, a practice known as obfuscation. While you might think that creating purposefully messy and impenetrable computer programs was not something anyone would aspire to, the ability to generate hard-to-follow code has become a hot commodity. Many companies are now selling software designed expressly to turn a series of neat and logical computer instructions into reams of seeming gibberish.
The motivation for code obfuscation grows from a fundamental change that has taken place in the way many programs are distributed. Traditionally, software developers would write an application in some high-level programming language, say, C++, and then "compile" it, which is to say translate it into low-level instructions that the processor on a particular machine can run. Users would only be given the compiled version of the program, not the source code. Although with special software it is possible to de-compile a program (transform the executable version back into source code), much is lost in translation—in particular, the embedded comments and helpful name assignments are not retained. Hence programmers could rest assured, knowing that as long as they didn't give out source code, outsiders couldn't unravel the software's inner workings.
The problem for software developers these days is that many common computer languages no longer compile into low-level, hard-to-read machine code. Instead the source code is transformed into an intermediate-level language, which is what gets distributed to the end user, where it runs on a "virtual machine" created by resident software. Java works this way, as do computer programs written for Microsoft's .NET framework, which is built into the new Vista operating system, released to the public in January.
Because the intermediate-level language fed into these virtual machines preserves a great deal of information that was in the original source code, decompilation becomes a serious problem for those trying, say, to avoid exposing security vulnerabilities or to prevent competitors from stealing parts of their code. The solution is to add an obfuscation step, which muddles things enough that the de-compiled code becomes difficult for a person to understand or reuse, although a computer is able to carry out the instructions and produce exactly the same results as if no obfuscation had been attempted.
Sebastian Holst of PreEmptive Solutions, an Ohio company that sells obfuscation software, points out that although the problem that obfuscation addresses has long been well known to programmers, many other people involved in corporate information technology are just now realizing that the use of Java and .NET can pose a security risk. "The technologists know this—it's like caller ID these days," says Holst, giving another example where formerly private information is now open to examination, "but the IT-risk people don't yet understand, and the coders don't mention it because it makes for more work."
Even before obfuscation became a valuable service to the software industry, some programmers enjoyed seeing how difficult they could make their code appear. Indeed, for more than two decades hackers have vied for top honors in that category by entering their best efforts in the International Obfuscated C Code Contest, known as the IOCCC for short. Landon Curt Noll and Larry Bassel, both then programmers at National Semiconductor, created this curious programming competition—the longest-running contest on the Internet—in 1984. Entrants must submit complete C programs no longer than 4,096 bytes that, according to IOCCC guidelines, "show the importance of programming style, in an ironic way." Knoll, who currently works as a cryptographer and security expert for NeoScale Systems of Milpitas, California, explains: "What we are saying is that programs that work are not good enough."
Acceptance of entries for the 19th IOCCC closed at the end of February, although winners are not likely to be announced for a few months yet. Past champions have included a flight simulator, a program that plots the positions of the four Galilean moons of Jupiter and a text-to-pig Latin translator, which has the added attraction of looking like a pig. There was also the world's shortest self-replicating C program, a file containing zero bytes of code: When compiled, the output is also nothing. Clearly, the contestants have a lot of fun formulating their entries. "We have a lot of fun, too," says Noll. "That's why we continue to do it."
Click "American Scientist" to access home page
American Scientist Comments and Discussion
To discuss our articles or comment on them, please share them and tag American Scientist on social media platforms. Here are links to our profiles on Twitter, Facebook, and LinkedIn.
If we re-share your post, we will moderate comments/discussion following our comments policy.