INSIGHTS | March 6, 2012

Enter the Dragon(Book), Part 1

By Shane Macaulay

This is a fairly large topic; I’ve summarized and written in a somewhat narrative/blog friendly way here.

A few years ago I was reading a blog about STL memory allocators (http://blogs.msdn.com/b/vcblog/archive/2008/08/28/the-mallocator.aspx), memory allocators being a source of extreme security risk, I took the author’s statement, “I’ve carefully implemented all of the integer overflow checks and so forth that would be required in real production code.” as a bit of a challenge.

After playing with permutations of the code I was able to get failures of this STL allocator. What was interesting to me is that I wasn’t only getting failures in being able to cause failures in my test code; I was able to also crash the compiler and linker.

Exploiting a compiler is nothing new; Trusting Trust by Ken Thompson is of course the preeminent philosophy on this topic. In a nutshell, a compiler can be made that compiles other applications with known subtle backdoors which when any valid/flawless code is compiled, the backdoor is included, very interesting and tricky.

David A. Wheeler has a page dedicated to his PhD dissertation that (http://www.dwheeler.com/trusting-trust/) proposes a fairly simple technique known as Diverse Double-Compiling (DDC) where you compile all code with a save/trusted compiler to validate your possibly-untrusted compiler output. Sounds simple and effective enough?

Enter the dragon(book), or rather the C specification. I am not a language lawyer (and I do not even play one on T.V.), but what’s interesting about the C specification is that there are significant portions of state that are left to the imagination of the compiler writer (i.e. undefined operations). What if you could exploit this behavior in a deterministic way? What if you could exploit this behavior in a cross-compiler-deterministic way?

It would seem then that you would have the perfect backdoor, undetectable by DDC techniques or even manual inspection.

After some time and checking with vendors on the security sensitivity nature of this class of problems, I found out that there was unlikely to be a “fix” (unless the C specification is altered). This gave me a clear conscience to publish the method.

The attached code is the code I used to win the backdoor hiding contest @ DEFCON (http://defcon.org). It is a library class written in C++/CLI that exposes a number of methods that allow for the loading/saving of data to a disk file.

See if you can find the backdoor, I’ll post the explanation and details on the flaw soon.

// eBookLib.cpp : main project file.
// Project requirements
// Add/Remove/Query eBooks
// One code file (KISS in effect)
//

//
// **** Mostly generated from Visual Studio project templates ****
//
#define WIN32_LEAN_AND_MEAN
#define _WIN32_WINNT 0x501

#include <windows.h>
#include <stdio.h>
#include <wchar.h>

#include <msclrmarshal.h>

#using <Microsoft.VisualC.dll>
#using <System.dll>
#using <System.Core.dll>

using namespace System;
using namespace System::IO;
using namespace System::Threading;
using namespace System::Threading::Tasks;
using namespace System::Reflection;
using namespace System::Diagnostics;
using namespace System::Globalization;
using namespace System::Collections::Generic;
using namespace System::Security::Permissions;
using namespace System::Runtime::InteropServices;
using namespace System::IO::MemoryMappedFiles;
using namespace System::IO;
using namespace System::Runtime::CompilerServices;

using namespace msclr;
using namespace msclr::interop;

//
// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
//
[assembly:AssemblyTitleAttribute(“eBookLib”)];
[assembly:AssemblyDescriptionAttribute(“”)];
[assembly:AssemblyConfigurationAttribute(“”)];
[assembly:AssemblyCompanyAttribute(“Microsoft”)];
[assembly:AssemblyProductAttribute(“eBookLib”)];
[assembly:AssemblyCopyrightAttribute(“Copyright (c) Microsoft 2010”)];
[assembly:AssemblyTrademarkAttribute(“”)];
[assembly:AssemblyCultureAttribute(“”)];
//
// Version information for an assembly consists of the following four values:
//
// Major Version
// Minor Version
// Build Number
// Revision
//
// You can specify all the value or you can default the Revision and Build Numbers
// by using the ‘*’ as shown below:

[assembly:AssemblyVersionAttribute(“1.0.*”)];

[assembly:ComVisible(false)];

[assembly:CLSCompliantAttribute(true)];

[assembly:SecurityPermission(SecurityAction::RequestMinimum, UnmanagedCode = true)];

////////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Native structures used from legacy system,
// define the disk storage for our ebook,
//
// The file specified by the constructor is read from and loaded automatically, it is also auto saved when closed.
////////////////////////////////////////////////////////////////////////////////////////////////////////////////
enum eBookFlag
{
NOFLAG = 0,
ACTIVE = 1,
PENDING_REMOVE = 2
};

typedef struct _eBookAccountingData
{
// Binary Data, may include nulls
char PurchaseOrder[ACCOUNTING_SIZE];
char RecieptData[ACCOUNTING_SIZE];
size_t PurchaseOrderLength;
size_t RecieptDataLength;
} eBookAccountingData, *PeBookAccountingData;

typedef struct _eBookPublicData
{
wchar_t ISBN[BUFSIZ];
wchar_t MISC[BUFSIZ];
wchar_t ShortName[BUFSIZ];
wchar_t Author[BUFSIZ];
wchar_t LongName[BUFSIZ];
wchar_t PathToFile[MAX_PATH];
int Rating;
int SerialNumber;
} eBookPublicData, *PeBookPublicData;

typedef struct _eBook
{
eBookFlag Flag;
eBookAccountingData Priv;
eBookPublicData Pub;
} eBook, *PeBook;

// define managed analogues for native/serialized types
namespace Client {
namespace ManagedEbookLib {
[System::FlagsAttribute]
public enum class ManagedeBookFlag : int
{
NOFLAG = 0x0,
ACTIVE = 0x1,
PENDING_REMOVE = 0x2,
};

public ref class ManagedEbookPublic
{
public:
__clrcall ManagedEbookPublic()
{
ISBN = MISC = ShortName = Author = LongName = PathToFile = String::Empty;
}
Int32 Rating;
String^ ISBN;
String^ MISC;
Int32 SerialNumber;
String^ ShortName;
String^ Author;
String^ LongName;
String^ PathToFile;
};

public ref class ManagedEbookAccounting
{
public:
__clrcall ManagedEbookAccounting()
{
PurchaseOrder = gcnew array<Byte>(0);
RecieptData = gcnew array<Byte>(0);
}
array<Byte>^ PurchaseOrder;
array<Byte>^ RecieptData;
};

public ref class ManagedEbook
{
public:
__clrcall ManagedEbook()
{
Pub = gcnew ManagedEbookPublic();
Priv = gcnew ManagedEbookAccounting();
}
ManagedeBookFlag Flag;
ManagedEbookPublic^ Pub;
ManagedEbookAccounting^ Priv;
array<Byte^>^ BookData;
};
}
}

using namespace Client::ManagedEbookLib;

// extend marshal library for native/managed inter-op
namespace msclr {
namespace interop {
template<>
inline ManagedEbookAccounting^ marshal_as<ManagedEbookAccounting^, eBookAccountingData> (const eBookAccountingData& Src)
{
ManagedEbookAccounting^ Dest = gcnew ManagedEbookAccounting;

if(Src.PurchaseOrderLength > 0 && Src.PurchaseOrderLength < sizeof(Src.PurchaseOrder))
{
Dest->PurchaseOrder = gcnew array<Byte>((int) Src.PurchaseOrderLength);
Marshal::Copy(static_cast<IntPtr>(Src.PurchaseOrder[0]), Dest->PurchaseOrder, 0, (int) Src.PurchaseOrderLength);
}

if(Src.RecieptDataLength > 0 && Src.RecieptDataLength < sizeof(Src.RecieptData))
{
Dest->RecieptData = gcnew array<Byte>((int) Src.RecieptDataLength);
Marshal::Copy(static_cast<IntPtr>(Src.RecieptData[0]), Dest->RecieptData, 0, (int) Src.RecieptDataLength);
}

return Dest;
};
template<>
inline ManagedEbookPublic^ marshal_as<ManagedEbookPublic^, eBookPublicData> (const eBookPublicData& Src) {
ManagedEbookPublic^ Dest = gcnew ManagedEbookPublic;
Dest->Rating = Src.Rating;
Dest->ISBN = gcnew String(Src.ISBN);
Dest->MISC = gcnew String(Src.MISC);
Dest->SerialNumber = Src.SerialNumber;
Dest->ShortName = gcnew String(Src.ShortName);
Dest->Author = gcnew String(Src.Author);
Dest->LongName = gcnew String(Src.LongName);
Dest->PathToFile = gcnew String(Src.PathToFile);
return Dest;
};
template<>
inline ManagedEbook^ marshal_as<ManagedEbook^, eBook> (const eBook& Src) {
ManagedEbook^ Dest = gcnew ManagedEbook;

Dest->Priv = marshal_as<ManagedEbookAccounting^>(Src.Priv);
Dest->Pub = marshal_as<ManagedEbookPublic^>(Src.Pub);
Dest->Flag = static_cast<ManagedeBookFlag>(Src.Flag);

return Dest;
};
}
}

// Primary user namespace
namespace Client
{
namespace ManagedEbooks
{
// “Store” is Client::ManagedEbooks::Store()
public ref class Store
{
private:
String^ DataStore;
List<ManagedEbook^>^ Books;
HANDLE hFile;

// serialization from disk
void __clrcall LoadDB()
{
Books = gcnew List<ManagedEbook^>();
eBook AeBook;
DWORD red = 0;

marshal_context^ x = gcnew marshal_context();
hFile = CreateFileW(x->marshal_as<const wchar_t*>(DataStore), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, 0, OPEN_ALWAYS, 0, 0);

if(hFile == INVALID_HANDLE_VALUE)
return;

do {
ReadFile(hFile, &AeBook, sizeof(eBook), &red, NULL);

if(red == sizeof(eBook))
Books->Add(marshal_as<ManagedEbook^>(AeBook));

} while(red == sizeof(eBook));
}

// scan hay for anything that matches needle
bool __clrcall MatchBook(ManagedEbook ^hay, ManagedEbook^ needle)
{
// check numeric values first
if(hay->Pub->Rating != 0 && hay->Pub->Rating == needle->Pub->Rating)
return true;
if(hay->Pub->SerialNumber != 0 && hay->Pub->SerialNumber == needle->Pub->SerialNumber)
return true;

// scan each string
if(!String::IsNullOrEmpty(hay->Pub->ISBN) && hay->Pub->ISBN->Contains(needle->Pub->ISBN))
return true;
if(!String::IsNullOrEmpty(hay->Pub->MISC) && hay->Pub->MISC->Contains(needle->Pub->MISC))
return true;
if(!String::IsNullOrEmpty(hay->Pub->ShortName) && hay->Pub->ShortName->Contains(needle->Pub->ShortName))
return true;
if(!String::IsNullOrEmpty(hay->Pub->Author) && hay->Pub->Author->Contains(needle->Pub->Author))
return true;
if(!String::IsNullOrEmpty(hay->Pub->LongName) && hay->Pub->LongName->Contains(needle->Pub->LongName))
return true;
if(!String::IsNullOrEmpty(hay->Pub->PathToFile) && hay->Pub->PathToFile->Contains(needle->Pub->PathToFile))
return true;
return false;
}

// destructor
__clrcall !Store()
{
Close();
}

// serialization to disk happens here
void __clrcall _Close()
{
if(hFile == INVALID_HANDLE_VALUE)
return;

SetFilePointer(hFile, 0, NULL, FILE_BEGIN);
for each(ManagedEbook^ book in Books)
{
eBook save;
DWORD wrote=0;
marshal_context^ x = gcnew marshal_context();
ZeroMemory(&save, sizeof(save));

save.Pub.Rating = book->Pub->Rating;
save.Pub.SerialNumber = book->Pub->SerialNumber;
save.Flag = static_cast<eBookFlag>(book->Flag);

swprintf_s(save.Pub.ISBN, sizeof(save.Pub.ISBN), L”%s”, x->marshal_as<const wchar_t*>(book->Pub->ISBN));
swprintf_s(save.Pub.MISC, sizeof(save.Pub.MISC), L”%s”, x->marshal_as<const wchar_t*>(book->Pub->MISC));
swprintf_s(save.Pub.ShortName, sizeof(save.Pub.ShortName), L”%s”, x->marshal_as<const wchar_t*>(book->Pub->ShortName));
swprintf_s(save.Pub.Author, sizeof(save.Pub.Author), L”%s”, x->marshal_as<const wchar_t*>(book->Pub->Author));
swprintf_s(save.Pub.LongName, sizeof(save.Pub.LongName), L”%s”, x->marshal_as<const wchar_t*>(book->Pub->LongName));
swprintf_s(save.Pub.PathToFile, sizeof(save.Pub.PathToFile), L”%s”, x->marshal_as<const wchar_t*>(book->Pub->PathToFile));

if(book->Priv->PurchaseOrder->Length > 0)
{
pin_ptr<Byte> pin = &book->Priv->PurchaseOrder[0];

save.Priv.PurchaseOrderLength = min(sizeof(save.Priv.PurchaseOrder), book->Priv->PurchaseOrder->Length);
memcpy(save.Priv.PurchaseOrder, pin, save.Priv.PurchaseOrderLength);
pin = nullptr;
}

if(book->Priv->RecieptData->Length > 0)
{
pin_ptr<Byte> pin = &book->Priv->RecieptData[0];

save.Priv.RecieptDataLength = min(sizeof(save.Priv.RecieptData), book->Priv->RecieptData->Length);
memcpy(save.Priv.RecieptData, pin, save.Priv.RecieptDataLength);
pin = nullptr;
}

WriteFile(hFile, &save, sizeof(save), &wrote, NULL);
if(wrote != sizeof(save))
return;
}
CloseHandle(hFile);
hFile = INVALID_HANDLE_VALUE;
}

protected:

// destructor forwards to the disposable interface
virtual __clrcall ~Store()
{
this->!Store();
}

public:

// possibly hide this
void __clrcall Close()
{
_Close();
}

// constructor
__clrcall Store(String^ DataStoreDB)
{
DataStore = DataStoreDB;
LoadDB();
}

// add ebook
void __clrcall Add(ManagedEbook^ eBook)
{
Books->Add(eBook);
}

// remove ebook
void __clrcall Remove(ManagedEbook^ eBook)
{
Books->Remove(eBook);
}

// get query list
List<ManagedEbook^>^ __clrcall Query(ManagedEbook^ eBook)
{
List<ManagedEbook^>^ rv = gcnew List<ManagedEbook^>();

for each(ManagedEbook^ book in Books)
{
if(MatchBook(book, eBook))
rv->Add(book);
}
return rv;
}
};
}
}

INSIGHTS | February 24, 2012

IOActive’s IOAsis at RSA 2012

By IOActive

This is not a technical post as usual. This is an invitation for an important event if you are going to RSA 2012 and want to escape the chaos and experience the luxury at IOAsis while enjoying great technical talks and meeting with industry experts. If you want to feel like a VIP and have great time then don’t miss this opportunity!

We have scheduled some really interesting talks such as:

Firmware analysis of Industrial Devices with IOActive researcher Ruben Santamarta
Mobile Security in the Enterprise with IOActive VP, David Baker and IOActive Principal Consultant, Ilja van Sprundel
The Social Aspect of Pen Testing with IOActive Managing Consultant, Ryan O’Horo
Battling Compliance in the Cloud with IOActive Principal Compliance Consultant, Robert Zigweid

We hope to see you there!

INSIGHTS | February 17, 2012

Estimating Password and Token Entropy (Randomness) in Web Applications

By Ryan O'Horo

Entropy

“In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits. In this context, a ‘message’ means a specific realization of the random variable.” [1]

1. http://en.wikipedia.org/wiki/Entropy_%28information_theory%29

I find myself analyzing password and token entropy quite frequently and I’ve come to rely upon Wolfram Alpha and Burp Suite Pro to get my estimates for these values. It’s understandable why we’d want to check a password’s entropy. It gives us an indication of how long it would take an attacker to brute force it, whether in a login form or a stolen database of hashes. However, an overlooked concern is the entropy contained in tokens for session and object identifiers. These values can also be brute forced to steal active sessions and gain access to objects to which we do not have permission. Not only are these tokens sometimes too short, they sometimes also contain much less entropy than appears.

Estimating Password Entropy

Wolfram Alpha has a keyword specifically for analyzing passwords.

http://www.wolframalpha.com/input/?i=password+strength+f00b4r^LYFE

Estimating Token Entropy

Estimating the solution for: [ characters ^ length = 2 ^ x ] will convert an arbitrary string value to bits of entropy. This formula is not really solvable, so I use Wolfram Alpha to estimate the solution.

e.g. 1tdrtahp4y8201att8i414a7km has the formula:

http://www.wolframalpha.com/input/?i=36^26+%3D+2^x

Click “Approximate Form” under the “Real solution”:

The password strength calculator also works okay on tokens, and we’ll see a similar result:

http://www.wolframalpha.com/input/?i=password+strength+1tdrtahp4y8201att8i414a7km

BUT! Analysis of a single token is not enough to measure /effective/ entropy. Burp Suite Sequencer will run the proper entropy analysis tests on batches of session identifiers to estimate this value. Send your application login request (or whatever request generates a new token value) to the Sequencer and configure the Sequencer to collect the target token value. Start collecting and set the “Auto-Analyze” box to watch as Burp runs its tests.

A sample token “1tdrtahp4y8201att8i414a7km” from this application has an estimated entropy of 134.4 bits, but FIPS analysis of a batch of 2000 of these identifiers shows an effective entropy of less than 45 bits!

Not only that, but the tokens range in length from 21 to 26 characters, some are much shorter than we originally thought.

Burp will show you many charts, but these bit-level analysis charts will give you an idea of where the tokens are failing to meet expected entropy.

You can spot a highly non-random value near the middle of the token (higher is better), and the varying length of the tokens drag down entropy near the end. The ASCII-based character set used in the token have one or more unused or underused bits, as seen in the interspersed areas of very low entropy.

In the case illustrated above I would ask the client to change the way randomness is supplied to the token and/or increase the token complexity with a hashing function, which should increase attack resistance.

Remember, for session or object identifiers, you want to get close to 128 bits of /effective/ entropy to prevent brute forcing. This is a guideline set by OWASP and is in line with most modern web application frameworks.

If objects persist for long periods or are very numerous (in the millions) you’ll want more entropy to maintain the same level of safety as a session identifier, which is more ephemeral. An example of persistent objects (on the order of years) which rely on high entropy tokens would be Facebook photo URLs. Photos marked private are still publicly accessible, but Facebook counts on the fact that their photo URLs have high entropy.

The following URL has at least 160 bits of entropy:

https://fbcdn-sphotos-a.akamaihd.net/hphotos-ak-ash4/398297_10140657048323225_750784224_11609676_1712639207_n.jpg

For passwords, the analysis is a little more subjective, but Wolfram Alpha gives you a good estimate. You can use this password analysis for encryption keys or passphrases as well, e.g. if they are provided as part of a source code audit.

Happy Hacking!

INSIGHTS | February 8, 2012

I can still see your actions on Google Maps over SSL

By Vincent Berg

A while ago, yours truly gave two talks on SSL traffic analysis: one at 44Con and one at RuxCon. A demonstration of the tool was also given at last year’s BlackHat Arsenal by two of my co-workers. The presented research and tool may not have been as groundbreaking as some of the other talks at those conferences, but attendees seemed to like it, so I figured it might make some good blog content.

Traffic analysis is definitely not a new field, neither in general nor when applied to SSL; a lot of great work has been done by reputable research outlets, such as Microsoft Research with researchers like George Danezis. What recent traffic analysis research has tried to show is that there are enormous amounts of useful information to be obtained by an attacker who can monitor the encrypted communication stream.

A great example of this can be found in the paper with the slightly cheesy title Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow. The paper discusses some approaches to traffic analysis on SSL-encrypted web applications and applies them to real-world systems. One of the approaches enables an attacker to build a database that contains traffic patterns of the AutoComplete function in drop-down form fields (like Google’s Auto Complete). Another great example is the ability to—for a specific type of stock management web application—reconstruct pie charts in a couple of days and figure out the contents of someone’s stock portfolio.

After discussing these attack types with some of our customers, I noticed that most of them seemed to have some difficulty grasping the potential impact of traffic analysis on their web applications. The research papers I referred them to are quite dry and they’re also written in dense, scientific language that does nothing to ease understanding. So, I decided to just throw some of my dedicated research time out there and come up with a proof of concept tool using a web application that everyone knows and understands: Google Maps.

Since ignorance is bliss, I decided to just jump in and try to build something without even running the numbers on whether it would make any sense to try. I started by running Firefox and Firebug in an effort to make sense of all the JavaScript voodoo going on there. I quickly figured out that Google Maps works by using a grid system in which PNG images (referred to as tiles) are laid out. Latitude and longitude coordinates are converted to x and y values depending on the selected zoom level; this gives a three dimensional coordinate system in which each separate (x, y, z)-triplet represents two PNG images. The first image is called the overlay image and contains the town, river, highway names and so forth; the second image contains the actual satellite data.

Once I had this figured out the approach became simple: scrape a lot of satellite tiles and build a database of the image sizes using the tool GMapCatcher. I then built a tool that uses libpcap to approximate the image sizes by monitoring the SSL encrypted traffic on the wire. The tool tries to match the image sizes to the recorded (x,y,z)-triplets in the database and then tries to cluster the results into a specific region. This is notoriously difficult to do since one gets so many false positives if the database is big enough. Add to this the fact that it is next to impossible to scrape the entire Google Maps database since, first, they will ban you for generating so much traffic and, second, you will have to store many petabytes of image data.

With a little bit of cheating—I used a large browser screen so I would have more data to work with—I managed to make the movie Proof of Concept – SSL Traffic Analysis of Google Maps.

As shown in the movie, the tool has a database that contains city profiles including Paris, Berlin, Amsterdam, Brussels, and Geneva. The tool runs on the right and on the left is the browser accessing Google Maps over SSL. In the first attempt, I load the city of Paris and zoom in a couple of times. On the second attempt I navigate to Berlin and zoom in a few times. On both occasions the tool manages to correctly guess the locations that the browser is accessing.

Please note that it is a shoddy proof of concept, but it shows the concept of SSL traffic analysis pretty well. It also might be easier to understand for less technically inclined people, as in “An attacker can still figure out what you’re looking at on Google Maps” (with the addendum that it’s never going to be a 100% perfect and that my shoddy proof of concept has lots of room for improvement).

For more specific details on this please refer to the IOActive white paper Traffic Analysis on Google Maps with GMaps-Trafficker or send me a tweet at @santaragolabs.

WHITEPAPER |

Traffic Analysis on Google Maps with GMaps-Trafficker

By IOActive

This paper describes a high-level approach to identifying which geographical coordinates a user sees on Google Maps when using an SSL-encrypted channel. Provided you have built the correct profile, the GMaps-Trafficker tool allows you to identify which geographical coordinates a user is looking at on Google Maps, even though the user is accessing Google Maps over SSL. (more…)

INSIGHTS | February 3, 2012

Solving a Little Mystery

By Ruben Santamarta

Firmware analysis is a fascinating area within the vast world of reverse engineering, although not very extended. Sometimes you end up in an impasse until noticing a minor (or major) detail you initially overlooked. That’s why sharing methods and findings is a great way to advance into this field.

While looking for certain information during a session of reversing, I came across this great post. There is little to add except for solving the ‘mystery’ behind that simple filesystem and mentioning a couple of technical details.

This file system is part of the WindRiver’s Web Server architecture for embedded devices, so you will likely find it inside firmwares based on VxWorks. It is known as MemFS (watch out, not the common MemFS) or Wind River management file system, and basically allows devices to serve files via the embedded web server without needing an ‘actual’ file system since this one lies on its non-volatile memory.

VxWorks provides pagepack, a tool used to transform any file intended to be served by a WindWeb server into C code. Therefore, a developer just compiles everything into the same firmware image.

From a reverser’s point of view, what we should find is the following structure:

There are a few things here worth mentioning:

The header is not necessarily 12 but 8 so the third field seems optional.
The first 4 bytes look like a flag field that may indicate, among other things, whether a file data will be compressed or not (1 = Compressed, 2 = Plain)
The signature can vary between firmwares since it is defined by the constant ‘HTTP_UNIQUE_SIGNATURE’ , in fact, we may find this signature twice inside a firmware; the first one due to the .h where it is defined (close to other strings such as the webserver banner )and the second one already as part of the MemFS.

Hope these additional details help you on your future research.

INSIGHTS | January 17, 2012

A free Windows Vulnerability for the NSA

By Cesar Cerrudo

Some months ago at Black Hat USA 2011 I presented this interesting issue in the workshop “Easy and Quick Vulnerability Hunting in Windows,” and now I’m sharing it with all people a more detailed explanation in this blog post.

In Windows 7 or Windows 2008, in the folder C:WindowsInstaller there are many installer files (from already installed applications) with what appear to be random names. When run, some of these installer files (like Microsoft Office Publisher MUI (English) 2007) will automatically elevate privileges and try to install when any Windows user executes them. Since the applications are already installed, there’s no problem, at least in theory.

However, an interesting issue arises during the installation process when running this kind of installer: a temporary file is created in C:UsersusernameAppDataLocalTemp, which is the temporary folder for the current user. The created file is named Hx????.tmp (where ???? seem to be random hex numbers), and it seems to be a COM DLL from Microsoft Help Data Services Module, in which its original name is HXDS.dll. This DLL is later loaded by msiexec.exe process running under the System account that is launched by the Windows installer service during the installation process.

When the DLL file is loaded, the code in the DLL file runs as the System user with full privileges. At first sight this seems to be an elevation of privileges vulnerability since the folder where the DLL file is created is controlled by the current user, and the DLL is then loaded and run under the System account, meaning any user could run code as the System user by replacing the DLL file with a specially-crafted one before the DLL is loaded and executed.

Analysis reveals that the issue is not easily exploitable since the msiexec.exe process generates an MD5 hash of the DLL file and compares it with a known-good MD5 hash value that is read from a file located in C:WindowsInstaller, which is only readable and writable by System and Administrators accounts.

In order to exploit this issue, an attacker needs to replace the DLL file with a modified DLL file that contains exploit code that can match the valid MD5 hash. The attacker DLL will then be run under the System account, allowing privilege elevation and operating system compromise. The problem is that this is not a simple attack—it’s an attack to the MD5 hashing algorithm referred to as a second-preimage attack for which there are no practical attacks that I know of, so it’s impossible for a regular attacker to generate a file with the same MD5 hash as the existing DLL file.

The reason for the title of this post comes from the fact that intelligence agencies, which are known for their cracking technologies and power, probably could perform this attack and build a local elevation of privileges 0day exploit for Windows.

I don’t know why Microsoft continues using MD5; it has been banned by Microsoft SDL since 2005 and it seems there has been some component oversight or these components have been built without following SDL guidance. Who knows on what other functionality MD5 continues to be used by Microsoft, allowing abuse by intelligence agencies.

Note: When installing some Windows updates, the Windows Installer service also creates the same DLL file in the C:windowstemp folder, possibly allowing the same attack.

The following YouTube links provide more technical details and video demonstrations about this vulnerability.

References.

INSIGHTS | January 9, 2012

Common Coding Mistakes – Wide Character Arrays

By IOActive

This post contains a few of my thoughts on common coding mistakes we see during code reviews when developers deal with wide character arrays. Manipulating wide character strings is reasonably easy to get right, but there are plenty of “gotchas” still popping up. Coders should make sure they take care because a few things can slip your mind when dealing with these strings and result in mistakes.

A little bit of background:

The term wide character generally refers to character data types with a width larger than a byte (the width of a normal char). The actual size of a wide character varies between implementations, but the most common sizes are 2 bytes (i.e. Windows) and 4 bytes (i.e. Unix-like OSes). Wide characters usually represent a particular character using one of the Unicode character sets: in Windows this will be UTF-16 and for Unix-like systems, whose wide characters are twice the size, this will usually be UTF-32.

Windows seems to love wide character strings and has made them standard. As a result, many Windows APIs have two versions: functionNameA and functionNameW, an ANSI version and a wide char string version, respectively. If you’ve done any development on Windows systems, you’ll definitely be no stranger to wide character strings.

There are definite advantages to representing strings as wide char arrays, but there are a lot of mistakes to make, especially if you’re used to developing on Unix-like systems or you forget to consider the fact that one character does not equal one byte.

For example, consider the following scenario, where a Windows developer begins to unsuspectingly parse a packet that follows their proprietary network protocol. The code shown takes a UTF-16 string length (unsigned int) from the packet and performs a bounds check. If the check passes, a string of the specified length (assumed to be a UTF-16 string) is copied from the packet buffer to a fresh wide char array on heap.

[ … ]

if(packet->dataLen > 34 || packet->dataLen < sizeof(wchar_t)) bailout_and_exit();

size_t bufLen = packet->dataLen / sizeof(wchar_t);

wchar_t *appData = new wchar_t[bufLen];

memcpy(appData, packet->payload, packet->dataLen);

[ … ]

This might look okay at first glance; after all, we’re just copying a chunk of data to a new wide char array. But consider what would happen if packet->dataLen was an odd number. For example, if packet->dataLen = 11, we end up with size_t bufLen = 11 / 2 = 5 since the remainder of the division will be discarded.

So, a five-element–long wide character buffer is allocated into which the memcpy() copies 11 bytes. Since five wide chars on Windows is 10 bytes (and 11 bytes are copied), we have an off-by-one overflow. To avoid this, the modulo operator should be used to check that packet->dataLen is even to begin with; that is:

if(packet->dataLen % 2) bailout()

Another common occurrence is to forget that the NULL terminator on the end of a wide character buffer is not a single NULL byte: it’s two NULL bytes (or 4, on a UNIX-like box). This can lead to problems when the usual len + 1 is used instead of the len + 2 that is required to account for the extra NULL byte(s) needed to terminate wide char arrays, for example:

int alloc_len = len + 1;

wchar_t *buf = (wchar_t *)malloc(alloc_len);

memset(buf, 0x00, len);

wcsncpy(buf, srcBuf, len);

If srcBuf had len wide chars in it, all of these would be copied into buf, but wcsncpy() would not NULL terminatebuf. With normal character arrays, the added byte (which will be a NULL because of the memset) would be the NULL terminator and everything would be fine. But since wide char strings need either a two- or four-byte NULL terminator (Windows and UNIX, respectively), we now have a non-terminated string that could cause problems later on.

Some developers also slip up when they wrongly interchange the number of bytes and the number of characters. That is, they use the number of bytes as a copy length when what the function was asking for was the number of characters to copy; for example, something like the following is pretty common:

int destLen = (stringLen * (sizeof(wchar_t)) + sizeof(wchar_t);

wchar_t *destBuf = (wchar_t *)malloc(destLen);

MultiByteToWideChar(CP_UTF8, 0, srcBuf, stringLen, destBuf, destLen);

[ do something ]

The MultiByteToWide API is documented at <http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072%28v=vs.85%29.aspx>.

The problem with the sample shown above is that the sixth parameter to MultiByteToWideChar is the length of the destination buffer in wide characters, not in bytes, as the call above was done. Our destination length is out by a factor of two here (or four on UNIX-like systems, generally) and ultimately we can end up overrunning the buffer. These sorts of mistakes result in overflows and they’re surprisingly common.

The same sort of mistake can also be made when using “safe” wide char string functions, like wcsncpy(), for example:

unsigned int destLen = (stringLen * sizeof(wchar_t)) + sizeof(wchar_t);

wchar_t destBuf[destLen];

memset(destBuf, 0x00, destLen);

wcsncpy(destBuf, srcBuf, sizeof(destBuf));

Although using sizeof(destuff) for maximum destination size would be fine if we were dealing with normal characters, this doesn’t work for wide character buffers. Instead, sizeof(destBuf) will return the number of bytes indestBuf, which means the wcsncpy() call above it can end up copying twice as many bytes to destBuf as intended—again, an overflow.

The other wide char equivalent string manipulation functions are also prone to misuse in the same ways as their normal char counterparts—look for all the wide char equivalents when auditing such functions as swprintf,wcscpy, wcsncpy, etc. There also are a few wide char-specific APIs that are easily misused; take, for example,wcstombs(), which converts a wide char string to a multi-byte string. The prototype looks like this:

size_t wcstombs(char *restrict s, const wchar_t *restrict pwcs, size_t n);

It does bounds checking, so the conversion stops when n bytes have been written to s or when a NULL terminator is encountered in pwcs (the source buffer). If an error occurs, i.e. a wide char in pwcs can’t be converted, the conversion stops and the function returns (size_t)-1, else the number of bytes written is returned. The MSDN considerswcstombs() to be deprecated, but there are still a few common ways to mess when using it, and they all revolve around not checking return values.

If a bad wide character is encountered in the conversion and you’re not expecting a negative number to be returned, you could end up under-indexing your array; for example:

int i;

i = wcstombs( … )  // wcstombs() can return -1

buf[i] = L'';

If a bad wide character is found during conversion, the destination buffer will not be NULL terminated and may contain uninitialized data if you didn’t zero it or otherwise initialize it beforehand.

Additionally, if the return value is n, the destination buffer won’t be NULL terminated, so any string operations later carried out on or using the destination buffer could run past the end of the buffer. Two possible consequences are a potential page fault if an operation runs off the end of a page or potential memory corruption bugs, depending on howdestbuf is usedlater . Developers should avoid wcstombs() and use wcstombs_s() or another, safer alternative. Bottom line: always read the docs before using a new function since APIs don’t always do what you’d expect (or want) them to do.

Another thing to watch out for is accidentally interchanging wide char and normal char functions. A good example would be incorrectly using strlen() on a wide character string instead of wcslen()—since wchar strings are chock full of NULL bytes, strlen() isn’t going to return the length you were after. It’s easy to see how this can end up causing security problems if a memory allocation is done based on a strlen() that was incorrectly performed on a wide char array.

Mistakes can also be made when trying to develop cross-platform or portable code—don’t hardcode the presumed length of wchars. In the examples above, I have assumed sizeof(wchar_t) = 2; however, as I’ve said a few times, this is NOT necessarily the case at all, since many UNIX-like systems have sizeof(wchar_t) = 4.

Making these assumptions about width could easily result in overflows when they are violated. Let’s say someone runs your code on a platform where wide characters aren’t two bytes in length, but are four; consider what would happen here:

wchar_t *destBuf = (wchar_t *)malloc(32 * 2 + 2);

wcsncpy(destBuf, srcBuf, 32);

On Windows, this would be fine since there’s enough room in destBuff for 32 wide chars + NULL terminator (66 bytes). But as soon as you run this on a Linux box—where wide chars are four bytes—you’re going to get wcsncpy()writing 4 * 32 + 2 = 130 bytes and resulting in a pretty obvious overflow.

So don’t make assumptions about how large wide characters are supposed to be since it can and does vary. Always usesizeof(wchar_t) to find out.

When you’re reviewing code, keep your eye out for the use of unsafe wide char functions, and ensure the math is right when allocating and copying memory. Make sure you check your return values properly and, most obviously, read the docs to make absolutely sure you’re not making any mistakes or missing something important.