DuQu Mystery Language Solved With the Help of Crowdsourcing

A group of researchers who recently asked the public for help in figuring out a mysterious language used in the DuQu virus have solved the puzzle, thanks to crowdsourcing help from programmers who wrote in to offer suggestions and clues.

The language, which DuQu used to communicate with command-and-control servers, turns out to be a special type of C code compiled with the Microsoft Visual Studio Compiler 2008.

Researchers at Kaspersky Lab, who put out the call for help two weeks ago after failing to figure out the language on their own, said they received more than 200 comments to a blog post they wrote seeking help, and more than 60 direct emails from programmers and others who made suggestions.

DuQu, an espionage tool that followed in the wake of the infamous Stuxnet code, had been analyzed extensively since its discovery last year. But one part of the code remained a mystery – an essential component of the malware that communicates with command-and-control servers and has the ability to download additional payload modules and execute them on infected machines.

Kaspersky researchers were unable to determine the language in which the communication module was written and published a blog post asking programmers for help. Identification of the language would help them build a profile of DuQu’s authors.

While other parts of DuQu were written in the C++ programming language and were compiled with Microsoft’s Visual C++ 2008, this part was not. Kaspersky also ruled out Objective C, Java, Python, Ada, Lua or many other languages they knew.

Most commenters who wrote in response to Kaspersky’s plea thought the code was a variant of LISP, but the reader who led them in the right direction was a commenter who identified himself as Igor Skochinsky and wrote in a thread posted to Reddit.com that he was certain the code was generated with the Microsoft Visual Studio Compiler and offered some cogent reasons why he believed this. Two other people who sent Kaspersky direct emails made crucial contributions when they suggested that the code appeared to be generated from a custom object-oriented C dialect — referred to as OO C — using special extensions.

This led the researchers to test various combinations of compiler and source codes over a few days until they found the right combination that produced binary that matched the style in DuQu.

The magic combination was C code compiled with Microsoft Visual Studio Compiler 2008 using options 01 and Ob1 in the compiler to keep the code small.

“Visual C can optimize for speed and it can optimize for size, or it can do some kind of balance between the two,” says Costin Raiu, director of Kaspersky’s Global Research and Analysis Team. “But they wanted obviously the smallest possible size of code” to get it onto victim machines via an exploit.

A custom framework allowed DuQu’s authors to meld C code with object-oriented programming.

The use of object-oriented C to write the event-driven code in DuQu reveals something about the programmers who coded this part of DuQu – they were probably old-school coders, Kaspersky’s researchers say. The programming style is uncommon for malware and is more commonly found in professionally-produced commercial software created ten years ago, Raiu says. The techniques make DuQu stand out “like a gem from the large mass of ‘dumb’ malicious program we normally see,” the Kaspersky researchers note.

The idea that the coders are “old school” is also supported by their use of C over the more modern C++ language. Some commenters told Kaspersky that coders who were actively programming a decade ago didn’t like C++ because, when compiled, it was known to produce code that could be unpredictable.

“When you write C code, you can be sure that the program will be executed the way you intend it to,” Raiu says. “With C++ it’s a bit different. In C++ you have some language features, for instance constructors, which will be executed transparently by the language. So you will never code a constructor directly. Instead, the compiler codes the constructor for you [and] basically you lose control of the whole thing. You can’t be sure that your code will be executed in the way intended.”

It suggests that whoever coded this part of DuQu was conservative, precise, and wanted 100 percent assurance that the code would work the way they wanted it to work.

But there was one other reason DuQu’s old-school programmers might have preferred C over C++ — its versatility. When C++ was initially developed, it was not standardized and wouldn’t compile in every compiler. C was more flexible. DuQu was delivered to Windows machines using a Microsoft Word zero-day exploit. But Raiu thinks DuQu’s programmers might have chosen C because they wanted to make sure that their code could be compiled with any compiler on any platform, suggesting they were thinking ahead to other ways in which their code might be used.

“Obviously when you create such a complex espionage tool, you take into account that maybe some day you will run it on servers, maybe you will want to run it on mobile phones or God knows what other devices, so you just want to make sure your code will work everywhere,” he says.