Identify Backdoors in Firmware By Using Automatic String Analysis

The Industrial Control Systems Cyber Emergency Response Team (ICS-CERT) this Friday published an advisory about some backdoors I found in two programmable gateways from TURCK, a leading German manufacturer of industrial automation products.

http://ics-cert.us-cert.gov/advisories/ICSA-13-136-01

Using hard-coded account credentials in industrial devices is a bad idea. I can understand the temptation among manufacturers to include a backdoor “support” mechanism in the firmware for a product such as this. This backdoor allows them to troubleshoot problems remotely with minimal inconvenience to the customer.

On the other hand, it is only a matter of time before somebody discovers these ‘secret’ backdoors. In many cases, it takes very little time.

The TURCK backdoor is similar to other backdoors that I discussed at Black Hat and in previous blog posts. The common denominator is this: you do not need physical access to an actual device to find its backdoor. All you have to do is download device firmware from the vendor’s website. Once you download the firmware, you can reverse engineer it and learn some interesting secrets.

For example, consider the TURCK firmware. The binary file contains a small header, which simplifies the reverse engineering process:

The offset 0x1c identifies the base address. Directly after the header are the ARM opcodes. This is all the information you need to load the firmware into an IDA disassembler and debugger.

A brief review revealed that the device has its own FTP server, among other services. I also discovered several hard-coded credentials that are added when the system starts. Together, the FTP server and hard-coded credentials are a dangerous combination. An attacker with those credentials can completely compromise this device.

Find Hidden Credentials Fast

You can find hidden credentials such as this using manual analysis. But I have a method to discover hidden credentials more quickly. It all started during a discussion with friends at the LaCon conference regarding these kinds of flaws. One friend (hello @p0f ! ) made an obvious but interesting comment: “You could just find these backdoors by running the ‘strings’ command on the firmware file, couldn’t you?”

This simple observation is correct. All of the backdoors that are already public (such as Schneider, Siemens, and TURCK) could have been identified by looking at the strings … if you know common IT/embedded syntax. You still have to verify that potential backdoors are valid. But you can definitely identify suspicious elements in the strings.

There is a drawback to this basic approach. Firmware usually contains thousands of strings and it can take a lot of time to sift through them. It can be much more time consuming than simply loading the firmware in IDA, reconstructing symbols, and finding the ‘interesting’ functions.

So how can one collect intelligence from strings in less time? The answer is an analysis tool we created and use at IOActive called Stringfighter.

How Does Stringfighter Work?

Imagine dumping strings from a random piece of firmware. You end up with the following list of strings:

[Creating socket]

ERROR: %n

Initializing

Decompressing kernel

GetSrvAddressById

NO_ADDRESS_FOUND

Xi$11ksMu8!Lma

From your security experience you notice that the last string seems different from the others—it looks like a password rather than a valid English word or a symbol name. I wanted to automate this kind of reasoning.

This is not code analysis. We only assume certain common patterns (such as sequential strings and symbol tables) usually found in this kind of binary. We also assume that compilers under certain circumstances put strings close to their references.

As in every decision process, the first step is to filter input based on selected features. In this case we want to discover ‘isolated’ strings. By ‘isolated’ I’m referring to ‘out-of-context’ elements that do not seem related to other elements.

An effective algorithm known as the ‘Levenshtein distance’ is frequently used to measure string similarity:

http://en.wikipedia.org/wiki/Levenshtein_distance

To apply this algorithm we create a window of n bytes that scans for ‘isolated’ strings. Inside that window, each string is checked against its ‘neighbors’ based on several metrics including, but not limited to, the Levenshtein distance.

However, this approach poses some problems. After removing blocks of ‘related strings,’ some potentially isolated strings are false positive results. For example, an ‘isolated’ string may be related to a distant ‘neighbor.’

We therefore need to identify additional blocks of ‘related strings’ (especially those belonging to symbol tables) formed between distant blocks that later become adjacent.

To address this issue we build a ‘first-letter’ map and run this until we can no longer remove additional strings.

At this point we should have a number of strings that are not related. However, how can we decide whether the string is a password? I assume developers use secure passwords, so we have to find strings that do not look like natural words.

We can consider each string as a first-order Markov source. To do this, use the following formula to calculate its entropy rate:

http://en.wikipedia.org/wiki/Entropy_(information_theory)#Data_as_a_Markov_process

We need a large dictionary of English (or any other language) words to build a transition matrix. We also need a black list to eliminate common patterns.

We also want to avoid the false positives produced by symbols. To do this we detect symbols heuristically and ‘normalize’ them by summing the entropy for each of its tokens rather than the symbol as a whole.

These features help us distinguish strings that look like natural language words or something more interesting … such as undocumented passwords.

The following image shows a Stringfighter analysis of the TURCK BL20 firmware. The first string in red is highlighted because the tool detected that it is interesting.

In this case it was an undocumented password, which we eventually confirmed by analyzing the firmware.

The following image shows a Stringfighter analysis of the Schneider NOE 771 firmware. It revealed several backdoor passwords (VxWorks hashes). Some of these backdoor passwords appear in red.

Stringfighter is still a proof-of-concept prototype. We are still testing the ideas behind this tool. But it has exceeded our initial expectations and has detected backdoors in several devices. The TURCK backdoor identified in the CERT advisory will not be the last one identified by IOActive Labs. See you in the next advisory!