Low-hanging Secrets in Docker Hub and a Tool to Catch Them All

TL;DR: I coded a tool that scans Docker Hub images and matches a given keyword in order to find secrets. Using the tool, I found numerous AWS credentials, SSH private keys, databases, API keys, etc. It’s an interesting tool to add to the bug hunter / pentester arsenal, not only for the possibility of finding secrets, but for fingerprinting an organization. On the other hand, if you are a DevOps or Security Engineer, you might want to integrate the scan engine to your CI/CD for your Docker images.

GET THE TOOL: https://github.com/matiassequeira/docker_explorer

The idea for this work came up when I was opening the source code for a project on which I was collaborating. Apart from migrating the source code to an open GitHub, we prepared a ready-to-go VM that was uploaded to an S3 bucket and a few Docker images that we pushed to Docker Hub. A couple of days later, a weekend to be more specific, we got a warning from AWS stating that our SNS resources were being (ab)used – more than 75k emails had been sent in a few hours. Clearly, our AWS credentials were exposed.

Once we deleted all the potentially exposed credentials and replaced them in our environments, we started to dig into the cause of the incident and realized that the credentials were pushed to GitHub along with the source code due to a miscommunication within the team. As expected, the credentials were also leaked to the VM, but the set of Docker images were fine. Anyhow, this got me thinking about the possibility of scanning an entire Docker images repository, the same way hackers do with source code repositories. Before starting to code something, I had to check whether it was possible.

Analyzing Feasibility

Getting a list of images

The first thing I had to check was if it was possible to retrieve a list of images that match a specific keyword. By taking a look at the API URLs using a proxy, I found:

https://hub.docker.com/v2/search/repositories?query={target}&page={page}&page_size={page_size}

The only limitation I found with API V2 was that it wouldn’t retrieve anything beyond page number 100. So, given the maximum page size of 100, I wouldn’t be able to scan more than 10k containers per keyword. This API also allows you to sort the list by pull count, so, if we retrieve the repositories with fewer pulls (in other words, the newest repositories), we have a greater chance of finding something fresh. Although there’s a V1 of the API that has many other interesting filters, it wouldn’t allow me to retrieve more than ~2.5k images.

After getting the image name, and since not all the images had the `latest` version, I had to make a second request to get a list of versions for each image to the following endpoint:

https://hub.docker.com:443/v2/repositories/{image}/tags/?page_size=25&page=1

Once I had the `image:version`, the only thing left was to pull the image, create a temporary container, and dump its filesystem.

Analyzing the image dump

This was one of the most important parts of my research, because it involved the engine I used for scanning the images. Before trying to create a scan engine, which is a big enough problem, and with so many options to choose from, I started to look for the most suitable tool which could work on an entire filesystem, which could look for a wide variety of secrets. After doing some research, I found many options, but sadly, most of them were oriented to GitHub secrets, had a high rate of false positives, or evaluated secrets as isolated strings (without considering format var=value).

While discussing this with one of my colleagues, he mentioned a tool that had been published less than a week ago called Whispers, a great tool by Artëm Tsvetkov and Christian Martorella. By doing some tests, I found this tool very convenient for several reasons:

Contains many search rules (AWS secrets, GitHub secrets, etc.) that assess potential secrets as var=value format
Findings are classified by type, impact (minor, major, critical, blocker), file location, etc.
Allows you to add more rules and plugins
Written in Python3 and thus very easy to modify

Developing the tool

Once I had everything in place, I started to work on a script to automate the process in order to scan thousands of images. By using/testing the tool, I came up with additional requirements, such as allowing the user to provide the number of core processors to use, limit Whispers execution time, store logs separately for each container, delete containers and images in order to avoid filling up the disk space, etc.

Also, in order to maximize the number of findings, minimize the number of false positives, and ease data triage, I made a couple of modifications to the standard Whispers:

Added a rule for Azure stuff
Excluded many directories and files
Saved files with potential secrets into directories

Running the tool

With the tool pretty much ready to analyze bulk images, I signed up for two different DigitalOcean accounts and claimed $100 in credit for each. Later, I spun up two servers, set up the tool in each environment, and ran the tool using a large set of keywords/targets.

The keywords/images I aimed to scan were mainly related to technologies that handle or have a high probability of containing secrets, such as:

DevOps software (e.g. Kubernetes / K8s / Compose / Swarm / Rancher)
Cloud services (e.g. AWS / EC2 / CloudFront / SNS / AWS CLI / Lambda)
CI/CD software (e.g. Jenkins / CircleCI / Shippable)
Code repositories (e.g. GitLab / GitHub)
Servers (e.g. NGINX / Apache / HAProxy)

After a month of running the tool, I found myself with a total of 20 GB of zipped data ready to triage, for which I had to develop an extra set of tools to clean all of the data, applying and reutilizing the same criteria. Among the rules or considerations, the most important were:

Created a list of false-positive strings that were reported as AWS access keys
Deleted AWS strings containing the string “EXAMPLE”
Discarded all the potential passwords that were not alphanumeric and shorter than 10 chars
Discarded passwords containing the string “password”
Discarded test, dummy, or incorrect SSH private keys
Deleted duplicate keys /values for each image, to lessen the amount of manual checking

Results

After many weeks of data triage, I found a wide variety of secrets, such as 200+ AWS accounts (of which 64 were still alive and 14 of these were root), 1,500+ valid SSH keys, Azure keys, several databases, .npmrc tokens, Docker Hub accounts, PyPI repository keys, many SMTP servers, reCAPTCHA secrets, Twitter API keys, Jira keys, Slack keys, and a few others.

Within the most notable findings was the whole infrastructure (SMTP servers, AWS keys, Twitter keys, Facebook keys, Twilio keys, etc.) of a US-based software company with approximately 300 employees. I reached out to the company, but, unfortunately, I did not hear back. Also, I identified an Argentinian software company focusing on healthcare that had a few proofs-of-concept with valid AWS credentials.

The most commonly overlooked files were ‘~/aws/credentials’, Python scripts with hardcoded credentials, the Linux bash_history, and a variety of .yml files.

So, what can I use the tool for?

If you are a bug bounty hunter or a pentester, you can use the tool with different keywords, such as the organization name, platform name, or developers’ names (or nicknames) involved in the program you are targeting. The impact of finding secrets can range from a data breach to the unauthorized use of resources (sending spam campaigns, Bitcoin mining, DDoS attack orchestration, etc.).

Security recommendations for Docker images creation

If you work in DevOps, development, or SysAdmin and currently use cloud infrastructure, these tips might come handy for Docker image creation:

Always try to use a fresh, clean Docker image.

Delete your SSH private key and SSH authorized keys:

sudo shred myPrivateSSHKey.pem authorized_keys

```
rm myPrivateSSHKey.pem authorized_keys
```

Delete ~.aws/credentials using the above method.
Clean bash history:
- ```
history -c
```
- ```
history -w
```
Don’t hardcode secrets in the code. Instead, use environment variables and inject them at the moment of container creation. Also, when possible, use mechanisms provided by container orchestrators to store/use secrets and don’t hardcode them in config files.
Perform a visual inspection of your home directory, and don’t forget hidden files and directories:
- ```
ls -la
```
If you are using a free version of Docker Hub, assume that everything within the image is public.

Update SEPTEMBER 2020: While writing this blog post, Docker Hub announced that by NOVEMBER 2020 it will start to limit the number of downloads by time. But, there’s always a way ;).

GET THE TOOL: https://github.com/matiassequeira/docker_explorer

-Matías Sequeira

Matías Sequeira is an independent security researcher. He started his career in cybersecurity as an Infosec consultant, working for clients in the financial and medical software fields. Concurrently, he began conducting research into ransomware and its defenses as part of the AntiRansomware Team. In recent years, Matías has been focused in the R&D of cybersecurity tools which were presented in conferences such as BlackHat, Ekoparty, and Hack In the Box, amongst others. Currently, he is pursuing an MSc in Cybersecurity at Northeastern University under a Fulbright scholarship and likes playing CTFs during his free time.