RESEARCH | September 22, 2015

Is Stegomalware in Google Play a Real Threat?

For several decades, the science of steganography has been used to hide malicious code (useful in intrusions) or to create covert channels (useful in information leakage). Nowadays, steganography can be applied to almost any logical/physical medium (format files, images, audio, video, text, protocols, programming languages, file systems, BIOS, etc.). If the steganographic algorithms are well designed, the hidden information is really difficult to detect. Detecting hidden information, malicious or not, is so complex that the study of steganalytic algorithms (detection) has been growing. You can see the growth in scientific publications (source: Scholar Google) and research investment by governments or institutions.
In fact, since the attacks on September 11, 2001, there has been a lot of discussion on the possibility of terrorists using this technology. See:
 
 
 
 
In this post, I would like to illustrate steganography’s ability to hide data in Android applications. In this experiment, I focus on Android applications published in Google Play, leaving aside alternative markets with lower security measures, where it is easier to introduce malicious code.
 
 
Is it possible to hide information on Google Play or in the Android apps released in it?
 
The answer is easy: YES! Simple techniques have been documented, from hiding malware by renaming the file extension (Android / tr DroidCoupon.A – 2011, Android / tr SmsZombie.A – 2012, Android / tr Gamex.A – 2013) to more sophisticated procedures (AngeCryption – BlackHat Europe October2014).
 
Let me show some examples in more depth:
 
 
Google Play Web (https://play.google.com)
 
Google Play includes a webpage for each app with information such as a title, images, and a text description. Each piece of information could conceal data using steganography (linguistic steganography, image steganography, etc.). In fact, I am going to “work” with digital images and demonstrate how Google “works” when there is hidden information inside of files.
 
To do this, I will use two known steganographic techniques: adding information to the end of file (EOF) and hiding information in the least significant bit (LSB) of each pixel of the image.
      
 
          PNG Images
 
You can upload PNG images to play.google.com that hide information using EOF or LSB techniques. Google does not remove this information.
 
For example, I created a sample app (automatically generated – https://play.google.com/store/apps/details?id=com.wMyfirstbaskeballgame) and uploaded several images (which you can see on the web) with hidden messages. In one case, I used the OpenStego steganographic tool (http://www.openstego.com/) and in another, I added the information at the end of an image with a hex editor.
 
The results can be seen by performing the following steps (analyzing the current images “released” on the website):
Example 1: PNG with EOF
 
 
Step 2: Loot at the end of the file 🙂
Example 2: PNG with LSB
 
 
Step 2: Recover the hidden information using Openstego (key=alfonso)
 
JPEG Images
If you try to upload a steganographic JPEG image (EOF or LSB) to Google Play, the hidden information will be removed. Google reprocesses the image before publishing it. This does not necessarily mean that it is not possible to hide information in this format. In fact, in social networks such as Facebook, we can “avoid” a similar problem with Secret Book or similar browser extensions. I’m working on it…
 
https://chrome.google.com/webstore/detail/secretbook/plglafijddgpenmohgiemalpcfgjjbph?hl=en-GB
 
In summary, based on the previous proofs, I can say that Google Play allows information to be hidden in the images of each app. Is this useful? It could be used to exchange hidden information (covert channel using the Google Play). The main question is whether an attacker could use information masked for some evil purpose. Perhaps they could use images to encode executable code that “will exploit” when you are visiting the web (using for example polyglots + stego exploits) or another idea. Time will tell…
 
 
 
APK Steganography
 
Applications uploaded toGoogle Play are not modified by the market. That is, an attacker can use any of the existing resources in an APK to hide information, and Google does not remove that information. For example, machine code (DEX), PNG, JPEG, XML, and so on.
 
 
Could it be useful to hide information on those resources?
 
An attacker might want to conceal malicious code on these resources and hinder automatic detection (static and dynamic analysis) that focuses on the code (DEX). A simple example would be an application that hides a specific phone number in an image (APT?).
 
The app verifies the phone number, and after a few clicks in a specific screen on a mobile phone, checks if the number is equal to that stored in the picture. If the numbers match, you can start leaking information (depending on the permissions allowed in the application).
 
I want to demonstrate the potential of Android stegomalware with a PoC. Instead of developing it, I will analyze an active sample that has been on Google Play since June 9, 2014. This stegomalware was developed by researchers at Universidad Carlos III de Madrid, Spain (http://www.uc3m.es). This PoC hides a DEX file (executable code) in an image (resource) of the main app. When the app is running, and the user performs a series of actions, the image recovers the “new” DEX file. This code runs and connects to a URL with a payload (in this case harmless). The “bad” behavior of this application can only be detected if we analyze the resources of the app in detail or simulate the interaction the app used for triggering the connection to the URL.
 
Let me show how this app works (static manual analysis):
Step 1: Download the APK to our local store. This requires a tool, such as an APK downloader extension or a specific web as http://apps.evozi.com/apk-downloader/
Step 2. Unzip the APK (es.uc3m.cosec.likeimage.apk)
Step 3. Using the Stegdetect steganalytic tool (https://github.com/abeluck/stegdetect) we can detect hidden information in the image “likeimage.jpg”. The author used the F5 steganographic tool (https://code.google.com/p/f5-steganography/).
 
es.uc3m.cosec.likeimageresdrawable-hdpilikeimage.jpg
likeimage.jpg : f5(***)
Step 4. To analyze (reverse engineer) what the app is doing with this image, I use the dex2jar and jd tools.
Step 5. Analyzing the code, we can observe the key used to hide information in the image. We can recover the hidden content to a file (bicho.dex).
java -jar f5.jar x -p cosec -e bicho.dex likeimage.jpg
 
Step 6. Analyzing the new file (bicho.dex), we can observe the connection to http://cosec-uc3m.appspot.com/likeimage for downloading a payload.
 
Step 7. Analyzing the code and payload, we can demonstrate that it is inoffensive.
ZGV4CjAzNQDyUt1DKdvkkcxqN4zxwc7ERfT4LxRA695kAgAAcAAAAHhWNBIAAAAAAAAAANwBAAAKAAAAcAAAAAQAAACYAAAAAgAAAKgAAAAAAAAAAAAAAAMAAADAAAAAAQAAANgAAABsAQAA+AAAACgBAAAwAQAAMwEAAD0BAABRAQAAZQEAAKUBAACyAQAAtQEAALsBAAACAAAAAwAAAAQAAAAHAAAAAQAAAAIAAAAAAAAABwAAAAMAAAAAAAAAAAABAAAAAAAAAAAACAAAAAEAAQAAAAAAAAAAAAEAAAABAAAAAAAAAAYAAAAAAAAAywEAAAAAAAABAAEAAQAAAMEBAAAEAAAAcBACAAAADgACAAEAAAAAAMYBAAADAAAAGgAFABEAAAAGPGluaXQ+AAFMAAhMTUNsYXNzOwASTGphdmEvbGFuZy9PYmplY3Q7ABJMamF2YS9sYW5nL1N0cmluZzsAPk1BTElDSU9VUyBQQVlMT0FEIEZST00gVEhFIE5FVDogVGhpcyBpcyBhIHByb29mIG9mIGNvbmNlcHQuLi4gAAtNQ2xhc3MuamF2YQABVgAEZ2V0UAAEdGhpcwACAAcOAAQABw4AAAABAQCBgAT4AQEBkAIAAAALAAAAAAAAAAEAAAAAAAAAAQAAAAoAAABwAAAAAgAAAAQAAACYAAAAAwAAAAIAAACoAAAABQAAAAMAAADAAAAABgAAAAEAAADYAAAAASAAAAIAAAD4AAAAAiAAAAoAAAAoAQAAAyAAAAIAAADBAQAAACAAAAEAAADLAQAAABAAAAEAAADcAQAA
 
The code that runs the payload:
 
Is Google detecting these “stegomalware”?
 
Well, I don’t have the answer. Clearly, steganalysis science has its limitations, but there are other ways to monitor strange behaviors in each app. Does Google do it? It is difficult to know, especially if we focus on “mutant” applications. Mutant applications are applications whose behavior could easily change. Detection would require continuous monitoring by the market. For example, for a few months I have analyzed a special application, including its different versions and the modifications that have been published, to observe if Google does anything with it. I will show the details:
Step 1. The mutant app is “Holy Quran video and MP3” (tr.com.holy.quran.free.apk). Currently at https://play.google.com/store/apps/details?id=tr.com.holy.quran.free

Step 2.  Analyzing the current and previous version of this app, I discover connections to specific URLs (images files). Are these truly images? Not all.

Step 3. Two URLs that the app connects to are very interesting. In fact, they aren’t images but SQLite databases (with messages in Turkish). This is the trivial steganography technique of simply renaming the file extension. The author changed the content of these files:

Step 4. If we analyze these databases, it is possible to find curious messages. For example, recipes with drugs.

Is Google aware of the information exchanged using their applications? This example does not cease to be a mere curiosity, but such procedures might violate the policy of publication of certain applications on the market or more serious things.
 
Figure: Recipes inside the file io.png (SQLite database)

In summary, this small experiment shows that we can hide information on Google Play and Android apps in general. This feature can be used to conceal data or implement specific actions, malicious or not. Only the imagination of an attacker will determine how this feature will be used…


Disclaimer: part of this research is based on a previous research by the author at ElevenPaths
RESEARCH | July 24, 2015

Differential Cryptanalysis for Dummies

Recently, I ventured into the crazy world of differential cryptanalysis purely to find out what the heck it was all about. In this post, I hope to reassure you that this strange and rather cool technique is not as scary as it seems. Hopefully, you’ll be attacking some ciphers of your own in no time!

A differential cryptanalysis attack is a method of abusing pairs of plaintext and corresponding ciphertext to learn about the secret key that encrypted them, or, more precisely, to reduce the amount of time needed to find the key. It’s what is called a chosen plaintext attack; the attacker has access to plaintext and corresponding ciphertext.

We’re going to attack a simple block cipher I stole using another explanation of the technique (links are available at the bottom of the post). This cipher is meant to exhibit a few rudimentary elements of a Substitution-Permutation Network (SPN) (minus some common elements for simplicity), while highlighting the power of the differential technique. So, let’s get started.

I think a good place to start is to look at the block cipher and check out what makes it tick. Here is a quick schematic of the algorithm:

Where:

  • C  is the cipher-text
  • K_0, K_1 are the first and second round keys, respectively
  • P is the plaintext
  • SBOX is the substitution box.

As you can probably guess from looking at the schematic,: the cipher takes in two 2 sub-keys (K_0,K_1) and a piece of plain-text. It, it then XORs the plain-text with the first key, K_0, then, afterwards pulls the plain-text through a substitution box (SBOX) (we will talk about this little gadget soon). From there, it then takes the output of the SBOX, and XORs it with K_01,  and then pulls it through the SBOX again to produce the file piece of final cipher-text. So now that we know how the cipher works, and we can begin formulating our attack.!

To start off with, let’s try writing down this cipher in an algebraic form: 

C=SBOX(SBOX(P⨁K_0 )⨁K_1 )

(1)

Great! We can start talking about some of the weird properties of this cipher.

We know that when it comes to the cryptographic security of an encryption function the only thing that should be relied on for Confidentiality is your secret key (these are the words of a great Dutch cryptographer a long time ago [https://en.wikipedia.org/wiki/Auguste_Kerckhoffs]). This means the algorithm, including the workings of the SBOXs, will be known (or rather, in our analysis, we should always assume it will be known) by our attacker. Anything secret in our algorithm cannot be relied on for security. More specifically, any operation that is completely reversible (requiring only knowledge of the ciphertext) is useless!

When it comes to the use of substitution boxes, unless something we don’t know meaningfully obscures the output of the SBOX, using it will not add any security. It will be reduced to a mere reversible “encoding” of its input.

Looking at the algebraic form in (1), we can see the cipher pulls the input through the SBOX again in its last operation before returning it as ciphertext. This operation adds no security at all, especially in the context of a differential attack, because it is trivial to reverse the effect of the last SBOX. To make this clear, imagine knowing the output of the SBOX, but not being capable of knowing the input. Given the context of the last SBOX, this is impossible if you know the ciphertext. We can reduce the encryption function and focus on what actually adds security. The function becomes:

C= SBOX(P⨁K_0 )⨁K_1
(2)

Ironically, the SBOX in this expression (2), is now the only thing that adds security, so it will be the target of our analysis. Consider what would happen if we could know the output of the SBOX, including the plaintext/ciphertext pairs. The entire security of the cipher would fly out the window. After a few XOR operations, we would be able to work out the second key, and by extrapolation, the first key.

But how would we know the SBOX output? Well, what about analyzing the way the SBOX behaves by observing how differences between inputs map to differences in outputs. This insight is at the heart of differential cryptanalysis! Why choose this property? The only thing we have access to is plaintext/ciphertext pairs in a differential cryptanalysis attack (thems the rules), so we need to derive our information from this data. Also, XOR differences have favorable properties under XOR, namely they are unaffected by them. If we look at how the SBOX maps these differences, this is what we get:

∆x
∆y
count
[(x,x^’), (y,y^’)]
15
4
1
[(0, 15), (3, 7)]
15
7
1
[(3, 12), (10, 13)]
15
6
2
[(4, 11), (4, 2)] [(5, 10), (9, 15)]
5
15
1
[(3, 6), (10, 5)]
5
10
2
[(0, 5), (3, 9)] [(1, 4), (14, 4)]
3
15
1
[(1, 2), (14, 1)]
3
12
2
[(5, 6), (9, 5)] [(13, 14), (12, 0)]
3
10
2
[(8, 11), (8, 2)] [(12, 15), (13, 7)]
5
2
1
[(11, 14), (2, 0)]
5
4
1
[(8, 13), (8, 12)]
5
7
1
[(2, 7), (1, 6)]
5
6
1
[(9, 12), (11, 13)]
5
8
1
[(10, 15), (15, 7)]
4
15
1
[(10, 14), (15, 0)]
4
12
1
[(3, 7), (10, 6)]
1
7
1
[(14, 15), (0, 7)]
1
1
1
[(12, 13), (13, 12)]
Where
  • ∆x = x⊕x^’
  • ∆y = y⊕y^’
  • x^’,x is the input pair to the SBOX
  • y’,y is the output pair of from the SBOX
  • And ∆x→ ∆y is a differential characteristic
  • Count is the number of times the differential characteristic appears
 
This table is a small sample of the data collected from the SBOX.

This mapping of the XOR difference in the input to the XOR difference in the output is called a differential characteristic. Hence the name, “differential” cryptanalysis.! We can now use this data to steal the secret key.! Given the cipher-text/plain-text pairs, we need to find one encrypted under our target key that satisfies one of the differential characteristics. To find that pair you need to do the following:

Consider a sample differential characteristic a →b (this can be any differential):
  1. Choose (or generate) one plain-text P, and produce another plain-text P^’=P⨁a , where a is an input differential (notice that the XOR difference of P and P’ is a!)
  2. Encrypt the two plain-texts (P,P’) to get (C,C’)
  3. If C⨁C’ = b,  then you have a good pair!

If you know that a →b is the differential that suits your text pair, you now know what the potential input/output pairs are for the SBOX, because of thanks to the analysis we did earlier on. ! So now given these pairs, you can try different calculations for the key. You will likely have a couple possible keys, but one of them will definitely be the correct one.

For example, let’s say we have the following setup:

The keys K_0, K_1 : 0,6
 (p,p’) :=> (c,c’) : input_diff :=> output_diff : [count, [(sI,sI’),(sO,sO’)]… ]
 (3,12) -> (11, 12) : 15 :=> 7 : [1, [[(3, 12), (10, 13)]]]
 (5,10) -> (9, 15) : 15 :=> 6 : [2, [[(4, 11), (4, 2)], [(5, 10), (9, 15)]]]
 (4,11) -> (4, 2) : 15 :=> 6 : [2, [[(4, 11), (4, 2)], [(5, 10), (9, 15)]]]
 (5,10) -> (9, 15) : 15 :=> 6 : [2, [[(4, 11), (4, 2)], [(5, 10), (9, 15)]]]
 (3,6) -> (3, 12) : 5 :=> 15 : [1, [[(3, 6), (10, 5)]]]

The script that generated this output is available at the end of the post.

We need to select a plain-text/cipher-text pair and use the information associated to with them the pair to calculate the key.  So for example let’s use, using (3,12) => (11,12), we can then calculate the K_0, K_1 in the following way:

Given then that (from the last operation of the cipher):

C=K_1⊕ S_o

We know that (because of how XOR works):

K_1=C⊕ S_o

Although we have a couple of choices for those values, so we need to run through them.
Given that:
K_0,K_1=(11,12)
C=(11,12) this is a tuple because we are using a pair of cipher-texts
P=(3,12)
S_i=(3,12)
S_o=(10,13)
We can then try to calculate the second round key as:
K_1=C[0]⊕ S_o [0]=1   wrong!
K_1=C[0]⊕ S_o [1]=6   right!
(3)
We can also calculate K_0, namely:
K_0=P[1]⊕ S_i [0]=   wrong!
K_0=C[0]⊕ S_o [1]=   wrong!
K_0=C[0]⊕ S_o [0]=0    right!
(4)
We’ve just derived the two sub keys.! Cryptanalysis successful!
The weird behavior of the SBOX under differential mapping is what makes differential cryptanalysis possible, but it this is not to say that this is the ONLY way to perform differential cryptanalysis. The property we are using here (XOR differences) is merely one application of this concept. You should think about differential cryptanalysis as leveraging any operation in an encryption function that can be used to “explain” differences in input/output pairs. So when you’re done reading about this application, look at some of the other common operations in encryption functions and think about what else, besides SBOXs, can be abused in this way; what other properties of the input/outputs pairs have interesting behaviors under common operations?.
Also, it may be useful to think about how you can extend this application of differential cryptanalysis in order to attack multi-round ciphers (i.e. ciphers that are built using multiple rounds of computation of ciphers like the one we’ve attacked here).
Further Reading and References:
Some Scripts to play around with:

RESEARCH | October 23, 2014

Bad Crypto 101

This post is part of a series about bad cryptography usage . We all rely heavily on cryptographic algorithms for data confidentiality and integrity, and although most commonly used algorithms are secure, they need to be used carefully and correctly. Just as holding a hammer backwards won’t yield the expected result, using cryptography badly won’t yield the expected results either.

 

To refresh my Android skillset, I decided to take apart a few Android applications that offer to encrypt personal files and protect them from prying eyes. I headed off to the Google Play Store and downloaded the first free application it recommended to me. I decided to only consider free applications, since most end users would prefer a cheap (free) solution compared to a paid one.

 

I stumbled upon the Encrypt File Free application by MobilDev. It seemed like the average file manager/encryption solution a user would like to use. I downloaded the application and ran it through my set of home-grown Android scanning scripts to look for possible evidence of bad crypto. I was quite surprised to see that no cryptographic algorithms where mentioned anywhere in the source code, apart from MD5. This looked reasonably suspicious, so I decided to take a closer look.

 

Looking at the JAR/DEX file’s structure, one class immediately gets my attention: the Crypt class in the com.acr.encryptfilefree package. Somewhere halfway into the class you find an interesting initializer routine:

 

public void init()
    {
        encrypt[0] = (byte)-61;
        encrypt[1] = (byte)64;
        encrypt[2] = (byte)43;
        encrypt[3] = (byte)-67;
        encrypt[4] = (byte)29;
        encrypt[5] = (byte)15;
        encrypt[6] = (byte)118;
        encrypt[7] = (byte)-50;
        encrypt[8] = (byte)-28;
        encrypt[9] = (byte)6;
        encrypt[10] = (byte)-75;
        encrypt[11] = (byte)87;
        encrypt[12] = (byte)-128;
        encrypt[13] = (byte)40;
        encrypt[14] = (byte)13;
        encrypt[15] = (byte)63;
        encrypt[16] = (byte)124;
        encrypt[17] = (byte)-68;
        …
        decrypt[248] = (byte)52;
        decrypt[249] = (byte)-51;
        decrypt[250] = (byte)123;
        decrypt[251] = (byte)105;
        decrypt[252] = (byte)-112;
        decrypt[253] = (byte)-86;
        decrypt[254] = (byte)-38;
        decrypt[255] = (byte)81;
    }

 

Continuing forward through the code, you immediately spot the following routine:
    // cleaned up decompiler code
    public byte[] decrypt(byte[] inputBuffer)
    {
        byte[] output = new byte[inputBuffer.length];
        int i = 0;
        while(i < inputBuffer.length)
        {
            int temp = inputBuffer[i];
            if(temp >= -128)
            {
                int temp = inputBuffer[i];
                if(temp < 128)
                {
                    int inputByte = inputBuffer[i];
                    int lookupPosition = 128 + inputByte;
                    int outputByte = decrypt[lookupPosition];
                    output[i] = (byte)i4;
                }
            }
            i = i + 1;
        }
        return output;

    }

 

This is basically a pretty bad substitution cipher. While looking through the code, I noticed that the values set in the init() function aren’t really used in production, they’re just test values which are likely a result of testing the code while it’s being written up. Since handling signedness is done manually, it is reasonable to assume that the initial code didn’t really work as expected and that the developer poked it until it gave the right output. Further evidence of this can be found in one of the encrypt() overloads in that same class, which contains a preliminary version of the file encryption routines.

 

Going further through the application reveals that the actual encryption logic is stored in the Main$UpdateProgress.class file, and further information is revealed about the file format itself. Halfway through the doInBackground(Void[] a) function you discover that the file format is basically the following:

 

The password check on the files itself turns out to be just a branch instruction, which can be located in Main$15$1.class and various other locations. Using this knowledge, an attacker could successfully modify a copy of the application that would allow unrestricted access to all of the files encoded by any password. Apart from rolling your own crypto, this is one of the worst offenses in password protecting sensitive data: make sure you use the password as part of the key for the data, and not as part of a branch instruction. Branches can be patched, keys need to be brute forced, or, in the event of a weak algorithm, calculated.

 

The substitution vector in the file is remarkably long–it seems that the vector stored is 1024 bytes. But, don’t be fooled–this is a bug. Only the first 256 bytes are actually used, the rest of them are simply ignored during processing. If we forget for a moment that a weak encoding is used as a substitute for encryption, and assume this is a real algorithm, reducing the key space from 1024 byte to 256 byte would be a serious compromise. Algorithms have ended up in the “do not use” corner for lesser offenses.

 

 

Yvan Janssens