INSIGHTS, RESEARCH | December 6, 2021

Cracking the Snapcode

A Brief Introduction to Barcodes

Barcodes are used everywhere: trains, planes, passports, post offices… you name it. And just as numerous as their applications are the systems themselves. Everybody’s seen a UPC barcode like this one:

But what about one like this on a package from UPS? 

This is a MaxiCode matrix, and though it looks quite different from the UPC barcode, it turns out that these systems use many common techniques for storing and reading data. Both consist of black or white “modules” which serve different purposes depending on their location. Some modules are used to help with orientation when scanning the barcode, some act as data storage, and some provide error correction in case the modules are obscured. (I won’t address how the error correction algorithms work, but those who are interested can read more here [3].)

The diagram above shows the orientation patterns used in UPC barcodes to designate the start, middle, and end of the barcode, as well as how the data-storage modules are encoded. The last digit of a UPC barcode is not used to store data, serving instead as a checksum to verify that no errors were made when printing or reading the barcode. 

Though they look quite different, MaxiCode matrices employ the same mechanisms:

I want to stop here for a moment and just appreciate the intricacy of this system. The tinkerer in me can’t help but wonder, How could someone possibly figure all this out?For better or for worse, there is no need to figure it out since MaxiCode is public domain and Wikipedia has all the answers. But wouldn’t that be an interesting puzzle? 

If you answered no, here’s a QR code for your troubles:

For those of you still reading, I’d like to introduce another barcode system, and the guest of honor in today’s adventure: Snapcode.

Snapcode is a proprietary 2D barcode system that can trigger a variety of actions when scanned in the Snapchat app. Snapcodes can add a friend, unlock image filters, follow a link, and more. Unlike MaxiCode, however, there is no public documentation about how the Snapcode system works! Thus the scene is set. Driven merely by curiosity, I set out to answer the following questions: 

1. What data do Snapcodes encode?

2. How do Snapcodes encode data?

3. What actions can be triggered when these codes are scanned?

Chapter 1: Our Adventure Begins

The Tale of the Treasure

The first question I had to answer was, Is it even possible? Figuring out how Snapcodes encode data is impossible without first knowing what data they encode. In the hopes of uncovering a reliable correlation between the data underlying a Snapcode and the Snapcode itself, I generated the following URL Snapcodes that would navigate to the same address when scanned. If the Snapcodes store the URL directly, then they should look very similar.

To aid in the process of ingesting these images, I wrote a simple Python script that I will reference periodically throughout this tale [6]. The “scan” method checks each position that could contain a dot and stores it as a 1 (present) or 0 (empty) in a 2D array. This allowed me to efficiently ingest, process, and visualize the data, like in the image below. This image was generated by putting a black dot where both Snapcodes had a dot, a white dot if neither Snapcode had a dot, and red if one had a dot and the other did not:

This first trial showed quite a few red dots, suggesting that there may not be any connection between the Snapcode and the URL it represents. Hoping for a clearer correlation, I tried another type of Snapcode which adds a user as a friend when scanned. Repeating the experiment with the add-friend Snapcodes of two users with similar names (“aaaac123456789” and “aaaad123456789”) showed a more promising result.

Generating the same type of secondary Snapcode gave the following matrix:

The top and bottom show quite a bit of red, but take a look at the regions just to the left and right of the center. There is almost no red! From this, I drew two conclusions. First, the add-friend Snapcodes store, potentially among other data, some form of the username. Second, the dots to the left and right of the center are the ones used to encode this data, since this is where the highest correlation occurs. 

There is still a long way to go, but we have taken an important first step. Fundamentally, we know that there is in fact something to find within these dots, and on top of that, the fact that we know what is being stored may help us down the line.

What’s Below Deck?

In addition to the Snapcodes, another area to explore was of course the Snapchat app. Just from playing around with the app, I knew that it had the ability to generate and read these codes, so perhaps a closer look would uncover something useful to my pursuit. Using the Android Debug Bridge [7], I pulled the Android package file (APK) from a phone with Snapchat installed. An APK is a ZIP file that contains many different types of information, but of greatest interest to me was the compiled Java code. From the many tools available to decompile the code and reverse engineer the app, I chose to use JADX [8].

After some time poking around the decompiled Java code, I found that the app referenced several methods from a Java Native Interface (JNI) library used to produce the Snapcode images. This library was packaged along with the compiled Java files and provided the following functions that can be called from Java code:

String nativeGenerateWithVersion(long j, int i, byte[] bArr);

String nativeGenerateDotsOnlyWithVersion(long j, int i, byte[] bArr);

These methods took (among other arguments) a byte array containing the underlying data, and returned an SVG image of the Snapcode. If I could call these methods with data that I controlled, perhaps I could determine what exactly each of the dots means.

Chapter 2: The Treasure Map

As any treasure-hunter knows, it’s important to be lazy resourceful. Snapchat was kind enough to provide all the code I needed to construct a map: the Snapcode library, the logic to load it, and the method signatures to create the Snapcode images. A little paring down and I had my very own Android app [9] that could create Snapcodes with any data I wanted. The question was, What data?

Some helpful error messages told me that each Snapcode stored 16 bytes of data, presumably mapping to 16 groupings of eight dots. To light these byte-groups up one at a time, I passed the function an array with one byte set to -1 (which Java represents as b11111111 using two’s complement) and the rest set to 0. The result was a sequence of Snapcodes with one of these groupings lit up at a time.

Notice that some groups of dots are always present, some light up only once throughout the set, and some turn off and on sporadically. It seems plausible that these regions are respectively acting as orientation patterns, data storage, and error correction, just as we saw in the UPC and MaxiCode standards. To more clearly show the byte groupings, the orientation patterns and error correction dots have been removed:

A different set of byte arrays can be used to determine the order of the dots within each of these groupings: setting one bit in each byte to 1 and the rest to 0. This can be achieved with a series of byte arrays with each byte in the array being set to the same power of 2. For example, the array is filled with all 1s (b00000001) to identify the lowest bit in each byte, all 2s (b00000010) for the second bit, all 4s (b00000100) for the third bit, and so on.

Pieced together correctly, these two sets of data provide a perfect map between a Snapcode and the bit-string of data it represented. From the first set of Snapcodes, we identified the grouping of bits that made up each byte as well as the order of the bytes. From the second, we learned the ordering of the bits within each byte. The dot corresponding to bit X of byte Y, then, would be the dot that is present in both Snapcode Y of the first set (groupings) and the Snapcode X of the second set (orderings).

For my script, this map took the form of a list of coordinates. The bit-string was constructed by checking the corresponding positions in the Snapcode grid one by one, adding a value of 1 to the bit-string if there was a dot in that position and a 0 if not.

DATA_ORDER = [(16,5), (17,6), (17,5), (16,6), (18,5), (18,6), (0,7), (1,8), (1,7), (0,8), (2,7), (2,8), (16,3), (17,4), (17,3), (16,4), (18,3),(18,4),(0,5),(1,6), (0,6), (1,5), (2,6), (2,5), (4,16), (5,17), (5,16), (4,17), (4,18), (5,18), (4,0), (5,1), (4,1), (5,0), (4,2), (5,2), (16,16), (17,16), (16,17), (17,17), (16,18), (18,16), (16,0), (17,1), (16,1), (17,2), (16,2), (18,2), (14,16), (15,17), (14,17), (15,18), (14,18), (15,16), (14,0), (15,1), (14,1), (15,2), (14,2), (15,0), (0,3), (1,4), (1,3), (0,4), (2,3), (2,4), (12,16), (13,17), (12,17), (13,18), (12,18), (13,16), (12,0), (13,1), (12,1), (13,2), (12,2), (13,0), (8,16), (9,17), (8,17), (9,18), (8,18), (9,16), (8,0), (9,1), (8,1), (9,2), (8,2), (9,0), (3,13), (4,14), (3,14), (3,15), (4,15), (5,15), (3,3), (4,3), (3,4), (4,4), (3,5), (5,3), (15,13), (14,14), (15,14), (13,15), (14,15), (15,15), (13,3), (14,4), (15,3), (14,3), (15,4), (15,5), (10,16), (11,17), (10,17), (11,18), (10,18), (11,16), (10,0), (11,1), (10,1), (11,2), (10,2), (11,0), (0,2), (1,2)]

Reordering the dot matrix (a 2D array of 1s and 0s) into a bit-string using this data structure looked something like this:

def reorder_bits(dots):
    return [dots[row][col] for (row,col) in DATA_ORDER]

It wasn’t exactly pretty, but the pieces were coming together. At this point, I knew the add-friend Snapcodes somehow stored the username, and I knew how to reorder the dots into a series of bits. The final transformation, how those bits were being decoded into characters, was all that remained.

Chapter 3: Lost at Sea

Making Headway?

The methodology from here was a bit fuzzy. I created an account with the desired username, fed the account’s Snapcode into my script, and out popped a string of 1s and 0s for me to… do something with. As in the previous phase, the choice of input was the crux of the matter. I began with usernames that seemed interesting on their own, like ones consisting of a single character repeated many times. The first two usernames, “aaaaaaaaaaaaa4m” and “zzzzzzzzzzzzz4m”, had the respective bit-string representations:

01000000100000000000001100000000010000000101000100010100010001010101000100010100010001010101000100010100010001010010000100101100 
00000000000000000000001010000001011000000001110011000111011100010001110011000111011100010001110011000111011100010010010000101100

Staring at 1s and 0s, hoping to find something, was a particular kind of fun. You can’t help but see patterns in the data, but it can be difficult to know whether they are just in your imagination or if you are really on to something. If you’d like, take a few minutes and see what you can find before reading on. What I took away from this first experiment was the following:

...[010100010001010001000101][010100010001010001000101][010100010001010001000101]0010000100101100

...[000111001100011101110001][000111001100011101110001][000111001100011101110001]0010010000101100

The only patterns that I could identify appeared in the last 88 bits of the string. Both strings had a sequence of 24 bits (bits 41 to 64, in bold) that repeated three times followed by a sequence of 16 bits (underlined). 14 of these last 16 bits were the same between the two bit-strings. I also noticed that a similar pattern could be found in the usernames:

[aaaa][aaaa][aaaa]a4m 
[zzzz][zzzz][zzzz]z4m

Finding patterns in the bit-string was exciting on its own, but finding matching patterns in the two representations of the data suggested the presence of a clear path forward in converting the bits to characters. However, try as I might to find a connection, these patterns led nowhere. Every one of my (sometimes hair-brained) theories on how these bits may have been converted to letters proved fruitless.

Where Are We?

Having hit a dead end, I changed my tack and tried to learn more about what constituted a valid Snapchat username. According to Snapchat’s documentation [10], usernames must consist of 3-15 characters chosen from an alphabet of 39: lowercase letters, digits, and the three symbols “.”, “-“, and “_”. Furthermore, they must begin with a letter, end with a letter or number, and contain at most one non-alphanumeric character. 

A little math shows that representing a single character from this 39-letter alphabet would require six bits, since 2^5 (32) < 39 < 2^6 (64). 15 characters, then, would require 90 bits. However, as far as I could tell, these 15 characters were being encoded in the 88 bits where I noticed the patterns. No other similarities showed up in the two bit-strings. How else could they be encoded, if not separately using six bits per character?

As some background research had turned up, one of the encoding schemes used in the QR code standard solves a similar problem. Using an alphabet of 45 characters, QR’s alphanumeric encoding scheme [11] treats pairs of characters as two-digit base-45 numbers and encodes the resulting value into binary. The result is two characters per 11 bits, rather than one per six bits! Hypothesizing that the creators of the Snapcode system may have done something similar, I tried each of the possible permutations for decoding sets of X bits into N characters using an alphabet of size 39, but none of them created strings that showed any pattern like the underlying username. 

This was just one of many rabbit holes I went down. I learned a great deal about other barcode encoding schemes and came up with many ways the engineers may have optimized the usage of those 88 bits, but with regards to decoding the Snapcode I was dead in the water.

Chapter 4: ‘X’ Marks the Spot

Land, Ho!

With a strategy as fuzzy as “staring at bits,” it should be no surprise that the final breakthrough came when I found a way to better present the data on which I was relying. Snapchat provides a mechanism for generating new Snapcodes and deactivating old ones, in case an old Snapcode is leaked and the user is receiving unwanted friend requests. Using this tool, I generated five Snapcodes for each of the accounts and combined these into a single string using the following rules: each character of this string was assigned a value of “1” if each of the five Snapcodes had a dot in the corresponding position, “0” if none of them had a dot in that position, or “x” if some had a dot and some didn’t. 

Reducing the noise in the data with this new representation made the answer I had been looking for as clear as day. The modified bit-strings looked like this:   

xx0xxxxxxxxxxxxxxxxxxx1xxxxxxxxx010xxxxx0101000100010100010001010101000100010100010001010101000100010100010001010010000100101100 

xxxxxxxxxx0xxxxxxxxxxxx0xxxxxxxx011xxxxx0001110011000111011100010001110011000111011100010001110011000111011100010010010001000010

These three extra bits (underlined) were separated from the rest of the data I had been looking at, bringing the total to 91. This meant the process of encoding a username could be done one character at a time. I felt quite silly having spent so much time trying to fit the username into fewer bits rather than looking for more bits that may be used, but I imagine the path of a treasure hunt is seldom a straight one.

Digging for Gold

Because the values of these 91 bits were identical in each of the five Snapcodes, it seemed safe to assume that they somehow contained the username. I continued from here using the Snapcodes of two more users: “abcdefghmnopggg” and “bcdefghnopqhhh”. The first seven characters are sequential and offset by one between the two names, a pattern I was hoping would highlight which bits were being incremented for each character. The respective bit-strings were:

...010xxxxx0101100110011000110001100111100110100000110010001011101010110000000011001000001000100000 

...011xxxxx01100001000110101110011110000001101000100000101111001011101101000010100010001010011x0xx0

Once again, some interesting patterns showed up. Both strings could be split up into segments whose binary values were either the same between the two usernames or off by exactly one:

010 ... 01011 001 1 001100 0 110 00110 01111 001 1 010000 ...
011 ... 01100 001 0 001101 0 111 00111 10000 001 1 010001 ...

Presumably, the segments that were identical between the two strings were the higher bits of the encoded character, whose values we may not expect to change, and the off-by-one segments were the lower bits, whose values would be incremented when representing sequential characters. 

I also noticed that the lengths of these segments followed the sequence 5-3-1-6-1-3-5. A strange pattern, it seemed at first, but it eventually dawned on me that these segments could be paired up to create chunks of six bits, each of which could represent a single character. I began enumerating the possible combinations of these segments, eventually coming across the following set of six-bit chunks:

[001|010] [0|01011] [001100] [00110|1] [001|110] [0|01111] [010000] ...
[001|011] [0|01100] [001101] [00111|0] [001|111] [0|10000] [010001] ...

Converted to decimal, these values show the same characteristics seen in the pair of usernames:

10, 11, 12, 13, 14, 15, 16 ...
11, 12, 13, 14, 15, 16, 17 ...

The second unknown, how these values were being converted into characters, fell quite nicely into place from here. Assuming 10 mapped to ‘a’, 11 to ‘b’, and so on, it felt safe to assume that 0 through 9 mapped to ‘0’ through ‘9’, and 36 through 38 represented the three symbols. Verifying these assumptions and identifying the exact value assigned to each character was achieved by testing them on a range of other usernames.

One final detail fell into place when trying to decode usernames that did not use all 15 available characters. The end of a username was simply marked by any value greater than 38, after which the remaining bits were ignored by the decoding process. QR codes use a similar mechanism, designed to avoid large empty spaces in the barcode that make it unsightly and harder to scan. 

In Python, the process of reordering the bit-string into six-bit chunks took the form of lists of integers whose value indicated the position of a bit in the bit-string. For example, the binary value of the first character was determined by taking bits 46-48 of the bit-string and appending bits 33-35:

USERNAME_DATA = [
    [46, 47, 48, 33, 34, 35],
    [56, 41, 42, 43, 44, 45],
    [50, 51, 52, 53, 54, 55],
    [60, 61, 62, 63, 64, 49],
    [70, 71, 72, 57, 58, 59],
    [80, 65, 66, 67, 68, 69],
    [74, 75, 76, 77, 78, 79],
    [84, 85, 86, 87, 88, 73],
    [94, 95, 96, 81, 82, 83],
    [104, 89, 90, 91, 92, 93],
    [98, 99, 100, 101, 102, 103],
    [108, 109, 110, 111, 112, 97],
    [118, 119, 120, 105, 106, 107],
    [128, 113, 114, 115, 116, 117],
    [122, 123, 124, 125, 126, 127]
]

A dictionary converted the decimal values of these chunks to characters:

HAR_MAP = {
    0: '0', 1: '1', 2: '2', ..., 9: '9',
    10: 'a', 11: 'b', 12: 'c', ..., 35: 'z',
    36: '-', 37: '_', 38: '.'  
}

With that, I was at last able to trace the username data through each stage of the decoding process: dots, bits, six-bit chunks, and finally characters.

Chapter 5: A Smooth Sail Home

Tying up Loose Ends

Revisiting my third research question, What actions can be triggered when these codes are scanned?, was simple compared to what I had just been through. Snapchat publicly documented several other types of Snapcodes that were easy to interact with, like URL Snapcodes and content Snapcodes to unlock in-app content. Others I had to read about, like ones that are used to pair Snapchat’s “Spectacles” devices to your phone [12]. 

I found the above Snapcode on a mysterious page of Snapchat’s website, which contained only the title “Snapchat Update.” Scanning it in the app did nothing on my phone, but presumably it would update the Snapchat app if it was out of date. I spent a good deal of time trying to reverse engineer the app to determine how this Snapcode is handled, and whether there were any other undocumented functions a Snapcode may invoke, but I was unable to find anything.

One final loose end that a curious reader may have identified was the mechanism for deactivating old Snapcodes mentioned in the previous chapter. Having several Snapcodes for each of the test users, I compared the values of the non-username bits both across accounts (e.g. the first Snapcode for each account) and within accounts (i.e. the sequence of Snapcodes for a single account). No discernible patterns showed up, which led me to hypothesize that the Snapcodes were differentiated by some sort of random key in the non-username bits. In this scenario, each account would be associated with one “active” key at a time, and the Snapchat app would only perform the add-friend function if the Snapcode with that user’s active key was scanned.

A Last Golden Nugget

I decided to see what else I could find in those types of Snapcode I could easily create, but neither one showed any patterns between the underlying data and the resulting Snapcode. As seen earlier, URL Snapcodes change drastically even when creating two that redirect to the same URL, and the content Snapcodes show no correlation between the barcode and the content pack metadata like the author, name, etc. 

Exploring Snapchat’s website eventually led me to the following URL:

https://www.snapchat.com/unlock/?type=SNAPCODE&uuid=c4bf0e0ec8384a06b22f67edcc02d1c3

On this page, there is a Snapcode labeled “Gold Fish Lens” that presumably once unlocked a Snapchat lens, though this no longer works when scanning it in the app. However, the HTTP parameter “uuid=c4bf0e0ec8384a06b22f67edcc02d1c3” jumped out as a possible piece of data that was being stored in this type of Snapcode. Sure enough, converting the dots to a bit-string (just as we did with the username) and then converting this bit-string to a hexadecimal string resulted in this exact UUID!

I found a similar piece of data when creating a URL Snapcode [14]. The initial response includes a “scannableId” in a similar format to the UUID. This value is then used in a subsequent request to pull up the image of the resulting Snapcode, leading me to believe it serves the same general purpose. 

Based on these findings, I hypothesized the following work flow: Whenever a new lens filter or sticker pack is created or a new URL Snapcode is requested, a UUID is generated and stored in a database along with any associated information like the content pack name or URL. When a Snapcode of one of these types is scanned in the app, it prompts a web request including this UUID to query the database and determine what action to perform.

There was nothing more I could (legally) try to definitively confirm this hypothesis, so I guess I’ll just have to wait for Snapchat to tell me if I got it right.

Final Considerations

Reflecting on this exercise, I came up with a few personal takeaways, as well as some thoughts for organizations who have their own proprietary barcode system or perhaps are considering implementing one. 

The first implication for barcode systems is that if they are used to store recognizable data, they can feasibly be cracked. Had I not known what data was stored in the add-friend Snapcodes, this entire project would have been dead in the water. It may be impossible to keep the process of barcode-to-bit-string transformation entirely secret if you need that functionality in client-accessible software (like the Snapchat mobile app), but this alone will not be enough to crack the barcode system if you don’t know the underlying data.

This makes Snapchat’s UUID system a great way to avoid leaking potentially sensitive information and significantly decrease the risk of the barcodes being reverse engineered in the first place. If the bits are translated directly to a hexadecimal UUID then perhaps there’s a chance of guessing how to decode the UUID as I did, but without access to the database that value is meaningless.

Inversely, storing any sensitive information in a barcode is a very bad idea, for obvious reasons. Even Snapchat’s inclusion of username data is potentially dangerous. Recall the solution they came up with in case an old Snapcode is leaked and the user is receiving unwanted friend requests; a malicious user can extract the username and find the updated Snapcode at this URL: https://www.snapchat.com/add/USERNAME. Snapchat does have other controls to prevent unwanted friend requests, but the ability to find a user’s active Snapcode effectively nullifies the current mitigation. (I did disclose this to Snapchat, and they have acknowledged and accepted the issue.)

As for my personal takeaways, the first is that sometimes it pays to have a wide range of skills, even if your experience is minimal. The skills involved in this challenge included reverse engineering, scripting, Android internals and development, web development, and logic. I am far from an expert in any of these domains, but in a challenge where the solution is not just a matter of one’s depth of technical knowledge, any angle of attack you can find is important. This could be as simple as knowing that a tool exists to do something you need.

Finally, I’d like to think I learned a lesson about making assumptions. When the information surrounding a problem is incomplete, some assumptions are required to make any progress at all, but it’s important to revisit these every so often to make sure they aren’t sending you down a rabbit hole. A breadth-first approach, exploring several possible solutions together as opposed to one at a time in depth, may have lessened the pain of realizing that days of work were useless.

I am sure you learned more than you ever wanted to know about Snapcodes, but I thank you for joining me for this adventure!

GUEST BLOG | October 6, 2021

The Risk of Cross-Domain Sharing with Google Cloud’s IAM Policies | Chris Cuevas and Erik Gomez, SADA

We’re part of the security resources at SADA, a leading Google Cloud Premier Partner. With our backgrounds being notably diverse, we appreciate the need for visibility of your core access controls.

If you’re involved in securing your enterprise’s Google Cloud Platform (GCP) environment, ideally, the organization policy for Domain Restricted Sharing (DRS) is well-regarded in your security toolbox. In the event DRS hasn’t made its way into your arsenal, after reading this post, please take a moment and review these docs.

While we’re not covering DRS in-depth here, we will be discussing related concepts. We believe it is crucial for an enterprise to maintain full visibility into which identities have access to its GCP resources. DRS is intended to prevent external or non-enterprise managed identities from obtaining or being provided Identity Access Management (IAM) role bindings within your GCP environment.

If we take this one step further, we believe an enterprise should maintain visibility of the use of its managed identities within external GCP environments. This is the basis of the post where we’ll raise a number of concerns.

The SADA security team has found a feature of IAM that presents challenges with detection and mitigation. We’ll refer to this IAM feature as Cross-Domain Sharing (XDS).

Introduction to XDS

Today, external parties with GCP environments can provide IAM role bindings to your enterprise’s managed identities. These IAM policies can be set and made effective without your knowledge or awareness, resulting in GCP resources being accessed beyond the boundaries of your enterprise. While we agree there are a number of valid use cases for these XDS IAM policies, we are not comfortable with the lack of enterprise visibility.

Malicious actors are constantly seeking new avenues to gain any type of foothold within a targeted organization. Targeting Cloud DevOps and SREs with social engineering attacks yields high rewards as these organizational employees have more elevated privileges and trusted relationships. 

Acknowledging this mindset, let’s consider the following:

Alice (alice@external.org) views internal.org as a prime target for a social engineering campaign combined with her newly discovered XDS bug. She quickly spins up a new GCP project called “Production Secrets” and adds a GCP IAM role binding to it for Bob (bob@internal.org) (see the diagram below).

Alice then initiates a social engineering campaign targeting Bob, informing him of the new “Production Secrets” project. As Alice is not part of the internal.org organization, the “Production Secrets” project is presented in Bob’s list of available GCP Projects without an organization association. And, if Bob searches for “Production Secrets” using the search bar of the GCP cloud console, the project will again be presented with no clear indicators it’s not actually affiliated with the internal.org GCP organization. With Bob not wanting to miss any team deadlines related to adopting the new “Production Secrets” project, he migrates secrets over and begins creating new ones within the “Production Secrets” project. Alice rejoices as internal.org’s secrets are now fully disclosed and available for additional attacks.

cross domain sharing (XDS) example

If your organization’s identities are being used externally, would you be able to prevent, or even detect, this type of activity? If Bob connects to this external project, what other attacks could he be vulnerable to in this scenario?

Keeping in mind Google Cloud’s IAM identities or “members” in IAM Policies can include users, groups, and domains, bad actors can easily increase their target scope from a single user identity to your entire enterprise. Once the nefarious GCP Project “Production Secrets” is in place and accessible by everyone in your enterprise with GCP environment access, the bad actors can wait for unintended or accidental access while developing more advanced phishing ruses.

Now, the good news!

The team at Google Cloud have been hard at work, and they recently released a new GCP Organization Policy constraint specifically to address this concern. The Organization Policy constraint “constraints/resourcemanager.accessBoundaries” once enabled, removes this concern as a broad phishing vector by not presenting external and no-organization GCP Projects within the Cloud Console and associated APIs. While this approach does not address all risks related to XDS, it does reduce the effective target scope.

Before you run and enable this constraint, remember there are valid use cases for XDS, and we recommend identifying all XDS projects and assessing if they are valid, or if they may be adversely affecting your enterprise’s managed identities. This exercise may help you identify external organizations that are contractors, vendors, partners, etc. and should be included in the Organization Policy constraint.

To further reduce the chances of successful exfiltration of your enterprise’s sensitive data from existing GCP resources via XDS abuse, consider also implementing Google Cloud’s VPC Service Controls (VPC-SC).

Is your GCP environment at risk, or do you have security questions about your GCP environment? SADA and IOActive are here to help. Contact SADA for a Cloud Security Assessment and IOActive for a Cloud Penetration Test.

Chris Cuevas, Sr Security Engineer, SADA
Erik Gomez, Associate CTO, SADA


Note: This concern has been responsibly reported to the Google Cloud Security team.

EDITORIAL | August 3, 2021

Counterproliferation: Doing Our Part

IOActive has always done its part in preventing the misuse of our work.

IOActive’s mission is to make the world a safer and more secure place. In the past, we’ve worked to innovate in the responsible disclosure process, with the most visible and memorable example being Dan Kaminsky’s research into DNS.[1] This involved one of the first uses of widespread, multiparty coordinated responsible disclosure, which quickly became the gold standard as referenced in CERT’s Guide to Responsible Disclosure.[2]

We don’t always talk publicly about our non-technical innovations, since they frequently aren’t as interesting as the groundbreaking cybersecurity research our team delivers. However, a couple recent events have prompted us to speak a bit about some of these less glamorous, but nonetheless extremely important innovations. First, we were deeply saddened by the passing of Dan Kaminsky, and would like to share how we’re building upon his legacy of non-technical innovation in vulnerability research. Second, a significant disclosure covered by global media organizations regarding the misuse of weaponized mobile phone vulnerabilities, packaged with surveillance tools, to target journalists and others for political purposes, rather than for lawful purposes consistent with basic human rights.

What We’re Doing

There are three primary elements to our policies that prevent the misuse of the vulnerabilities we discover.

Responsible Disclosure

IOActive has always had a policy of responsible disclosure. We transparently publish our policy on our website for everyone to see.[3] Over time, we’ve taken additional innovative steps to enhance this disclosure process.

We’ve built upon Dan’s innovation in responsible disclosure by sharing our research with impacted industries through multinational Information Sharing and Analysis Centers (ISACs).[4] Likewise, we’ve worked to confidentially disclose more of our pre-release research to our clients when it may impact them. As our consultants and researchers find new and innovative ways to break things, we’ll find new and innovative ways to disclose their work and associated consequences, with the goal of facilitating the best outcomes for all stakeholders.

Policy on the Sale of Vulnerabilities

IOActive is very clear on this simple policy, both publicly and with our clients: we do not sell vulnerabilities.

A well-developed market for vulnerabilities has existed for some time.[5] Unfortunately, other cybersecurity firms do sell vulnerabilities, and may not have the necessary ethical compartmentalization and required policies in place to safeguard the security and other interests of their clients and the public at large.

While we support the bug bounty concept, which can help reduce the likelihood of vulnerability sales and support the independent research community, as a commercial service bug bounties do not adequately address concerns such as personnel vetting or testing of resources only available when onsite at a client.

Contractual Responsible Disclosure Requirement

As a standard practice in our commercial work, we require the ability to report vulnerabilities we discover in third-party products externally only to the affected manufacturers, in addition to the client, to ensure that an identified defect can be properly corrected. IOActive offers to coordinate this disclosure process to the manufacturers on behalf of our clients.

This normally leads to a virtuous cycle of improved security for everyone through our commercial work. Any vulnerability discovery benefits not only the client, but the entire ecosystem, both of whom in turn benefit from the vulnerability discovery work we do for other clients.

Every person reading this post has benefited from better security in the products and services they and their organizations use every day, due to the combination of our fantastic consultants and clients who support doing the right thing for the ecosystem.

Fundamentally, when a vulnerability is corrected, that risk is retired for everyone who updates to the secure version and prevents the weaponization of the vulnerability. When those fixes are pushed out through an automated update process, the benefits accrue without any active effort on the part of end users or system maintainers.

How to Help

Make it Easy to Receive Disclosures

As a prolific vulnerability discloser, we see a wide spectrum of maturity in receiving and handling vulnerability disclosures. We must often resort to creative and time-intensive efforts to locate a contact who will respond to our attempts to disclose a vulnerability. Occasionally, we run into a dead end and are unable to make productive contact with organizations.

Here’s a short list of actions that will help make it easy to collect vulnerability information your organization really needs:

  1. Run a Vulnerability Disclosure Program. A vulnerability disclosure management program provides bidirectional, secure communication between the discloser and the impacted organization in a formal, operationalized manner. You can run such a program with internal resources or outsource it to a commercial firm providing managed vulnerability disclosure program services.
  2. Be Easy to Find. It should be simple and effortless for a researcher to find details on the disclosure process for any organization. A good test is to search for “<Your Organization Name> Vulnerability Disclosure” or “<Your Organization Name> Vulnerability Report” in a search engine. Ideally, your public disclosure page should appear in the first page or two of results.

Cesar Cerrudo, CTO of IOActive Labs, has a more in-depth post discussing how to get the best outcomes from working with researchers during the vulnerability disclosure process in his post, 10 Laws of Disclosure.[6]

Working with Security Firms

When you’re selecting a security firm for vulnerability discovery work, you should know what they will do with any vulnerabilities they find. Here are a few core questions for which any firm should have detailed, clear answers:

  • Does the company have a responsible disclosure policy?
  • What is the company’s policy regarding the sale of vulnerabilities?
  • Does the company require responsible disclosure of the vulnerabilities it discovers during client work?
  • How does the company handle third-party responsible disclosure for its clients?

Participate in the Discussion

The global norms around the sale and weaponization of cybersecurity vulnerabilities, as well as their integration into surveillance tools, are being established today. More constructive, thoughtful public debate today can prevent the current deleterious conduct from becoming a standard of global behavior with its associated dystopic outcomes through inattention and inaction.


References

[1] https://www.cnet.com/tech/services-and-software/security-bites-107-dan-kaminsky-talks-about-responsible-vulnerability-disclosure/
[2] https://resources.sei.cmu.edu/asset_files/SpecialReport/2017_003_001_503340.pdf
[3] https://ioactive.com/disclosure-policy/
[4] https://www.nationalisacs.org/
[5] https://www.rand.org/content/dam/rand/pubs/research_reports/RR600/RR610/RAND_RR610.pdf
[6] https://ioactive.com/10-laws-of-disclosure/

INSIGHTS, RESEARCH | July 30, 2021

Breaking Protocol (Buffers): Reverse Engineering gRPC Binaries

The Basics

gRPC is an open-source RPC framework from Google which leverages automatic code generation to allow easy integration to a number of languages. Architecturally, it follows the standard seen in many other RPC frameworks: services are defined which determine the available RPCs. It uses HTTP version 2 as its transport, and supports plain HTTP as well as HTTPS for secure communication. Services and messages, which act as the structures passed to and returned by defined RPCs, are defined as protocol buffers. Protocol buffers are a common serialization solution, also designed by Google.

Protocol Buffers

Serialization using protobufs is accomplished by definining services and messages in .proto files, which are then used by the protoc protocol buffer compiler to generate boilerplate code in whatever language you’re working in. An example .proto file might look like the following:

// Declares which syntax version is to follow; read by protoc
syntax = "proto3";

// package name allows for namespacing to avoid conflicts
// between message types. Will also determine namespace in C++
package stringmanipulation;


// The Service definition: this specifies what RPCs are offered
// by the service
service StringManipulation {

    // First RPC. RPC definitions are like function prototypes:
    // RPC name, argument types, and return type is specified.
    rpc reverseString (StringRequest) returns (StringReply) {}

    // Second RPC. There can be arbitrarily many defined for
    // a service.
    rpc uppercaseString (StringRequest) returns (StringReply) {}
}

// Example of a message definition, containing only scalar values.
// Each message field has a defined type, a name, and a field number.
message innerMessage {
    int32 some_val = 1;
    string some_string = 2;
}

// It is also possible to specify an enum type. This can
// be used as a member of other messages.
enum testEnumeration {
    ZERO = 0;
    ONE = 1;
    TWO = 2;
    THREE = 3;
    FOUR = 4;
    FIVE = 5;
}

// messages can contain other messages as field types.
message complexMessage {
    innerMessage some_message = 1;
    testEnumeration innerEnum = 2;
}

// This message is the type used as the input to both defined RPCs.
// Messages can be arbitrarily nested, and contain arbitrarily complex types.
message StringRequest {
    complexMessage cm = 1;
    string original = 2;
    int64 timestamp = 3;
    int32 testval = 4;
    int32 testval2 = 5;
    int32 testval3 = 6;
}

// This message is the type for the return value of both defined RPCs.
message StringReply {
    string result = 4;
    int64 timestamp = 2;
    complexMessage cm = 3;
}

There is a lot more to protocol buffers and the available options, if you’re interested Google has a very good language guide.

gRPC

gRPC is an RPC implementation designed to use protobufs to take care of all boilerplating necessary for implementation, as well as provided functions to manage the connection between the RPC server and its clients. The majority of compiled code in a gRPC server binary will likely be either gRPC library code and autogenerated classes, stubs etc. created with protoc. Only the actual implementation of RPCs is required of the developer and accomplished by extending the base Service class generated by protoc based on the definitions in .proto files..

Transport

gRPC uses HTTP2 for transport, which can either be on top of a TLS connection, or in the clear. gRPC also supports mTLS out of the box. What type of channel is used is configured by the developer while setting up the server/client.

Authentication

As mentioned above, gRPC support mTLS, wherein both the server and the client are identified based on exchanged TLS certificates. This appears to be the most common authentication mechanism seen in the wild (though “no authentication” is also popular). gRPC also supports Google’s weird ALTS which I’ve never seen actually being used, as well as token-based authentication.

It is also possible that the built-in authentication mechanisms will be eschewed for a custom authentication mechanism. Such a custom implementation is of particular interest from a security perspective, as the need for a custom mechanism suggests a more complex (and thus more error prone) authentication requirement.

gRPC Server Implementation

The following will be an overview of the major parts of a gRPC server implementation in C++. A compiled gRPC server binary can be extremely difficult to follow, thanks to the extensive automatically generated code and heavy use of gRPC library functions. Understanding the rough structure that any such server will follow (important function calls and their arguments) will greatly improve your ability to make sense of things and identify relevant sections of code which may present an attack surface.

Server Setup

The following is the setup boilerplate for a simple gRPC server. While a real implementation will likely be more complex, the function calls seen here will be the ones to look for in unraveling the code.

void RunServer() {
    std::string listen = "127.0.0.1:50006";
    // This is the class defined to implement RPCs, will be covered later
    StringManipulationImpl service;

    ServerBuilder builder;

    builder.AddListeningPort(listen, grpc::InsecureServerCredentials());
    builder.RegisterService(&service);

    std::unique_ptr<grpc::Server> server(builder.BuildAndStart());
    std::cout << "Server listening on port: " << listen << "\n";
    server->Wait();
}
  • builder.AddListeningPort: This function sets up the listening socket as well as handling the transport setup for the channel.
    • arg1: addr_uri: a string composed of the IP address and port to listen on, separated by a colon. i.e. “127.0.0.1:50001”
    • arg2: creds: The credentials associated with the server. The function call used here to generate credentials will indicate what kind of transport is being used, as follows:
      • InsecureServerCredentials: No encryption; plain HTTP2
      • SslServerCredentials: TLS is in use, meaning the client can verify the server and communication will be encrypted. If client authentication (mTLS) is to be used, relevant options will be passed to this function call. For example, setting opts.client_certificate_request to GRPC_SSL_REQUEST_AND_REQUIRE_CLIENT_CERTIFICATE_AND_VERIFY will require the client supply a valid certificate. Any potential vulnerabilities at this point will be in the options passed to the SslServerCredentials constructor, and will be familiar to any consultant. Do they verify the client certificate? Are self-signed certificates allowed? etc., standard TLS issues.
  • builder.RegisterService: This crucial function is what determines what services (and thereby what RPC calls) are available to a connecting client. This function is called as many times as there are services. The argument to the function is an instance of the class which actually implements the logic for each of the RPCs — custom code. This is the main point of interest for any gRPC server code review or static analysis, as it will contain the clients own implementation, where the likelihood of mistakes and errors will be higher.

RPC Implementation

The following is the implementation of the StringManipulationImpl instance passed to RegisterService above.

class StringManipulationImpl : public stringmanipulation::StringManipulation::Service {
    Status reverseString(ServerContext *context, 
                         const StringRequest *request, 
                         StringReply *reply) {


        std::string original = request->original();
        std::string working_copy = original;
        std::reverse(working_copy.begin(), working_copy.end());
        reply->set_result(working_copy);

        struct timeval tv;
        gettimeofday(&tv, NULL);

        printf("[%ld|%s] reverseString(\"%s\") -> \"%s\"\n", 
                tv.tv_sec, 
                context->peer().c_str(), 
                request->original().c_str(), 
                working_copy.c_str());

        return Status::OK;
    }

    Status uppercaseString(ServerContext *context, 
                           const StringRequest *request, 
                           StringReply *reply) {

        std::string working_copy = request->original();
        for (auto &c: working_copy) c = toupper(c);
        reply->set_result(working_copy.c_str());

        struct timeval tv;
        gettimeofday(&tv, NULL);

        printf("[%ld|%s] uppercaseString(\"%s\") -> \"%s\"\n", 
                tv.tv_sec, 
                context->peer().c_str(), 
                request->original().c_str(), 
                working_copy.c_str());

        return Status::OK;

    }
};

Here we see the implementation for each of the two defined RPCs for the StringManipulation service. This is accomplished by extending the base service class generated by protoc. gRPC implementation code like this will often follow this naming scheme, or something like it — the service name, appended by “Impl,” “Implementation,” etc.

Static Analysis

Finding Interesting Logic

These functions, generally, are among the most interesting targets in any test of a gRPC service. The bulk of the logic baked into a gRPC binary will be library code, and these functions which will actually be parsing and handling the data transmitted via the gRPC link. These functions can be located/categorized by looking for calls to builder.RegisterService.

Here we see just one call, because the example is simple, but in a more complex implementation there may be many calls to this function. Each one represents a particular service being made available, and will allow for the tracking down of the implementations of each RPC for those services. Navigating to the cross reference address, we see that an object is being passed to this function. Keep in mind this binary has been pre-annotated for clarity and the initial output of the reverse engineering tool will likely be less clear. However the function calls we care about should be clear enough to follow without much effort.

We see that before being passed to RegisterService, the stringManipulationImplInstance (name added by me) is being passed to a function, StringManipulationImpl::StringManipulationImpl. Based both on the context and the demangled name, this is a constructor for whatever class this is. We can see the constructor itself is very simple: 

The function calls another constructor (the base class constructor) on the passed object, then sets the value at object offset 0. In C++, this offset is usually (and in this case) reserved for the class’s vtable. Navigating to that address, we can see it:

Because this binary is not stripped, the actual names of the functions (matching the RPCs) are displayed. With a stripped binary, this is not the case, however an important quirk of the gRPC implementation results in the vtables for service implementations always being structured in a particular way, as follows.

  • The first two entries in the vtable are constructor/destructors.
  • Each subsequent entry is one of the custom RPC implementations, in the order that they appear in the .proto file. This means that if you are in possession of the .proto file for a particular service, even if a binary is stripped, you can quickly identify which implementation corresponds to which RPC. And if you don’t have the .proto file, but do have the binary, there is tooling available which is very effective at recovering .proto files from gRPC binaries, which will be covered later. This is helpful not only because you may get a hint at what the RPC does based on its name, but also because you will know the exact types of each of the arguments.

Anatomy of an RPC

There are a few details which will be common to all RPC implementations which will aid greatly in reverse engineering these functions. The first are the arguments to the functions:

  • Argument 1: Return value, usually of type grpc::Status. This is a C++ ABI thing, see section 3.1.3.1 of the Itanium C++ ABI Spec. Tracking sections of the code which write to this argument may be helpful in understanding authorization logic which may be baked into the function, for example if a function is called, and depending on its return value, arg1 is set to either grpc::Status::OK or grpc::Status::CANCELLED, that function may have something to do with access controls.
  • Argument 2: The this pointer. Points to the instance of whatever service class the RPC is a method on.
  • Argument 3: ServerContext. From the gRPC documentation:
    A ServerContext or CallbackServerContext allows the code implementing a service handler to:

    • Add custom initial and trailing metadata key-value pairs that will propagated to the client side.
    • Control call settings such as compression and authentication.
    • Access metadata coming from the client.Get performance metrics (ie, census).

    We can see in this function that the context is being accessed in a call to ServerContextBase::peer, which retrieves metadata containing the client’s IP and port. For the purposes of reverse engineering, that means that accesses of this argument (or method calls on it) can be used to access metadata and/or authentication information associated with the client calling the RPC. So, it may be of interest regarding authentication/authorization auditing. Additionally, if metadata is being parsed, look for data parsing/memory corruption etc. issues there.
  • Argument 4: RPC call argument object. This object will be of the input type specified by the .proto file for a given RPC. So in this example, this argument would be of type stringmanipulation::StringRequest. Generally, this is the data that the RPC will be parsing and manipulating, so any logic associated with handling this data is important to review for data parsing issues or similar that may lead to vulnerabilities.
  • Argument 5: RPC call return object. This object will be of the return type specified by the .proto file for a given RPC. So in this example, this argument would be of type stringmanipulation::StringReply. This is the object which is manipulated prior to return to the client.

Note: In addition to unary RPCs (a single request object and single response object), gRPC also supports streaming RPCs. In the case of unidirectional streams, i.e. where only one of the request or response is a stream, the number of arguments and order is the same, and only the type of one of the arguments will differ. For client-side streaming (i.e. the request is streamed) Argument 4 will be wrapped with a ServerReader, so in this example it will be of type ServerReader<StringRequest>. For Server side streaming (streamed response), it will be wrapped with a ServerWriter, so ServerWriter<StringReply>.

For bidirectional streams, where both the request and the response are streamed, the number of arguments differ. Rather than a separate argument for request and response, the function only has four arguments, with the forth being a ServerReaderWriter wrapping both types. In this example, ServerReaderWriter<StringRequest, StringReply>. See the gRPC documentation for more information on these wrappers. The C++ Basics Tutorial has some good examples.

Protobuf Member Accesses in C++

The classes generated by protoc for each of the input/output types defined in the .proto file are fairly simple. Scalar typed members are stored by value as member variables inside the class instance. Non-scalar values are stored as pointers to the member. The class includes (among other things) the following functions for getting and setting members:

  • .<member>(): get the value of the field with name <member>. This is applicable to all types, and will return the value itself for scalar types and a pointer to the member for complex/allocated types.
  • .set_<member>(value_to_set): set the value for a type which does not require allocation. This includes scalar fields and enums.
  • .set_allocated_<member>(value_to_set): set the value for a complex type, which requires allocation and setting of its own member values prior to setting in the request or reply. This is for composite/nested types.

The actual implementation for these functions is fairly uncomplicated, even for allocated types, and basically boils down to accessing the value of a pointer at some offset into the object whose member is being retrieved or set. These functions will not be named in a stripped binary, but are easy to spot.

The getters take the request message (in this example, request) as the sole argument, pass it through a couple of nested function calls, and eventually make an access to some offset into the message. Based on the offset, you can determine which field is being accessed, (with the help of the generated pb.h files, generation of which is covered later) and can thus identify the function and its return value.

The implementation for complex types is similar, adding a small amount of extra code to account for allocation issues.

Setter functions follow an almost identical structure, with the only difference being that they take the response message (in this example, reply) as the first argument and the value to set the field to as the second argument. 

And again, the only difference for complex type setters is a bit of extra logic to handle allocation when necessary.

Reconstructing Types

The huge amount of automatically generated code used by gRPC is a great annoyance to a prospective reverse engineer, but it can also be a great ally. Because the manner in which the .proto files are integrated into the final binary is uniform, and because the binary must include this information in some form to correctly deserialize incoming messages, it is possible in most cases to extract a complete reconstruction of the original .proto file from any software which uses gRPC for communication, whether that be a client or server.

This can be done manually with some studying up on protobuf Filedescriptors, but more than likely this will not be necessary — someone has probably already written something to do it for you. For this guide the Protobuf Toolkit (pbtk) will be used, but a more extensive list of available software for extracting .proto structures from gRPC clients and servers will be included in the Tooling section.

Generating .proto Files

By feeding the server binary we are working with into pbtk, the following .proto file is generated.

syntax = "proto3";

package stringmanipulation;

service StringManipulation {
    rpc reverseString(StringRequest) returns (StringReply);
    rpc uppercaseString(StringRequest) returns (StringReply);
}

message innerMessage {
    int32 some_val = 1;
    string some_string = 2;
}

message complexMessage {
    innerMessage some_message = 1;
    testEnumeration innerEnum = 2;
}

message StringRequest {
    complexMessage cm = 1;
    string original = 2;
    int64 timestamp = 3;
    bool testval = 4;
    bool testval2 = 5;
    bool testval3 = 6;
}

message StringReply {
    string result = 4;
    int64 timestamp = 2;
    complexMessage cm = 3;
}

enum testEnumeration {
    ZERO = 0;
    ONE = 1;
    TWO = 2;
    THREE = 3;
    FOUR = 4;
    FIVE = 5;
}

Referring back to the original .proto example at the beginning, we can see this is a perfect match, even preserving order of RPC declarations and message fields. This is important because we can now begin to correlate vtable members with RPCs by name and argument types. However, while we know the types of arguments being passed to each RPC, we do not know how each field is ordered inside the c++ object for each type. Annoyingly, the order of member variables for the generated class for a given type appears to be correlated neither to the order of definition in the .proto file, nor to the field numbers specified.

However, auto-generated code comes to the rescue again. While the order of member variables doe not appear to be tied to the .proto file at all, it does appear to be deterministic, based on analysis of numerous gRPC binaries. protoc uses some consistent metric for ordering the fields when generating the .pb.h header files, which are the source of truth for class/structure layout for the final binary. And conveniently, now that we have possession of a .proto file, we can generate these headers.

Defining Message Structures

The command protoc --cpp_out=. <your_generated_proto_file>.proto will compile the .proto file into the corresponding pb.cc and pb.h files. Here we’re interested in the headers. There is quite a bit of cruft to sift through in these files, but the general structure is easy to follow. Each type defined in the .proto file gets defined as a class, which includes all methods and member variables. The member variables are what we are interested in, since we need to know their order and C++ type in order to map out structures for each of them while reverse engineering.

The member variable declarations can be found at the very bottom of the class declaration, under a comment which reads @@protoc_insertion_point(class_scope:<package>.<type name>)

// @@protoc_insertion_point(class_scope:stringmanipulation.StringRequest)
 private:
  class _Internal;

  template <typename T> friend class ::PROTOBUF_NAMESPACE_ID::Arena::InternalHelper;
  typedef void InternalArenaConstructable_;
  typedef void DestructorSkippable_;
  ::PROTOBUF_NAMESPACE_ID::internal::ArenaStringPtr original_;
  ::stringmanipulation::complexMessage* cm_;
  ::PROTOBUF_NAMESPACE_ID::int64 timestamp_;
  bool testval_;
  bool testval2_;
  bool testval3_;
  mutable ::PROTOBUF_NAMESPACE_ID::internal::CachedSize _cached_size_;
  friend struct ::TableStruct_stringmanipulation_2eproto;

The member fields defined in the .proto file will always start at offset sizeof(size_t) * 2 bytes from the class object, so 8 bytes for 32 bit, and 16 bytes for 64 bit. Thus, for the above class (StringRequest), we can define the following struct for static analysis:

// assuming 64bit architecture, if 32bit pointer sizes will differ
struct StringRequest __packed {
    0x00: uint8_t dontcare[0x10];
    0x10: void *original_string; 
    0x18: struct complexMessage *cm; // This will also need to be defined, 
                                     // the same technique inspecting the pb.h file applies
    0x20: int64_t timestamp;
    0x28: uint8_t testval;
    0x29: uint8_t testval2;
    0x2a: uint8_t testval3;
};

Note: protobuf classes are packed, meaning there is no padding added between members to ensure 4 or 8 byte alignment. For example, in the above structure, the three bools will be found one after another at offsets 0x28, 0x29, and 0x2a, rather than at 0x28, 0x2c, and 0x30 as would be the case with 4 bit aligned padding. Ensure that your reverse engineering tool knows this when defining structs.

Once structures have been correctly defined for each of the types, it becomes quite easy to determine what each function and variable is. Take the first example for the Protobuf Member Accesses section, now updated to accept an argument of type StringRequest:

Its clear now that this function is the getter for the StringRequest.original, a string. Applying this technique to the rest of the RPC, changing function and variable names as necessary, produces fairly easy to follow decomplication:

From here, it is as simple as standard static analysis to look for any vulnerabilities which might be exploited in the server, whether it be in incoming data parsing or something else.

Active Testing

Most of the active testing/dynamic analysis to be performed re: gRPC is fairly self explanatory, and is essentially just fuzzing/communicating over a network protocol. If the .proto files are available (or the server or client binary is available, and thus the .proto files can be generated), they can be provided to a number of existing gRPC tooling to communicate with the server. If no server, client, or .protos are available, it is still possible to reconstruct the .proto to some extend via captured gRPC messages. Resources for various techniques and tools for actively testing a gRPC connection can be found in the Tooling section below.

Tooling

  • Protofuzz – ProtoFuzz is a generic fuzzer for Google’s Protocol Buffers format. Takes a proto specification and outputs mutations based on that specification. Does not actually connect to the gRPC server, just produces the data.
  • Protobuf Toolkit – From the pbtk README:

pbtk (Protobuf toolkit) is a full-fledged set of scripts, accessible through an unified GUI, that provides two main features:

  1. Extracting Protobuf structures from programs, converting them back into readable .protos, supporting various implementations:
    • All the main Java runtimes (base, Lite, Nano, Micro, J2ME), with full Proguard support,
    • Binaries containing embedded reflection metadata (typically C++, sometimes Java and most other bindings),
    • Web applications using the JsProtoUrl runtime.
  2. Editing, replaying and fuzzing data sent to Protobuf network endpoints, through a handy graphical interface that allows you to edit live the fields for a Protobuf message and view the result.
  • grpc-tools/grpc-dump – grpc-dump is a grpc proxy capable of deducing protobuf structure if no .protos are provided. Can be used similarly to mitmdump. grpc-tools includes other useful tools, including the grpc-proxy go library which can be used to write a custom proxy if grpc-dump does not suit the needs of a given test.
  • Online Protobuf Decoder – Will pull apart arbitrary protobuf data (without requiring a schema), displaying the hierarchical content.
  • Awesome gRPC – A curated list of useful resources for gRPC.

Resources

INSIGHTS | July 19, 2021

Techspective Podcast – The Value of Red and Purple Team Engagements

Episode 070. Tony Bradley of Techspective, chats with John Sawyer, IOActive Director of Services, Red Team, on the wide-ranging effects of alert fatigue, COVID-19 pandemic, physical security and more – directly affecting cybersecurity resiliency and the efficacy/benefits of red/purple team and pen-testing services.

GUEST BLOG | June 9, 2021

Cybersecurity Alert Fatigue: Why It Happens, Why It Sucks, and What We Can Do About It | Andrew Morris, GreyNoise

Introduction

“Although alert fatigue is blamed for high override rates in contemporary clinical decision support systems, the concept of alert fatigue is poorly defined. We tested hypotheses arising from two possible alert fatigue mechanisms: (A) cognitive overload associated with amount of work, complexity of work, and effort distinguishing informative from uninformative alerts, and (B) desensitization from repeated exposure to the same alert over time.”

Ancker, Jessica S., et al. “Effects of workload, work complexity, and repeated alerts on alert fatigue in a clinical decision support system.” BMC Medical Informatics and Decision Making, vol. 17, no. 1, 2017.

My name is Andrew Morris, and I’m the founder of GreyNoise, a company devoted to understanding the internet and making security professionals more efficient. I’ve probably had a thousand conversations with Security Operations Center (SOC) analysts over the past five years. These professionals come from many different walks of life and a diverse array of technical backgrounds and experiences, but they all have something in common: they know that false positives are the bane of their jobs, and that alert fatigue sucks.

The excerpt above is from a medical journal focused on drug alerts in a hospital, not a cybersecurity publication. What’s strangely refreshing about seeing these issues in industries outside of cybersecurity is being reminded that alert fatigue has numerous and challenging causes. The reality is that alert fatigue occurs across a broad range of industries and situations, from healthcare facilities to construction sites and manufacturing plants to oil rigs, subway trains, air traffic control towers, and nuclear plants.

I think there may be some lessons we can learn from these other industries. For example, while there are well over 200 warning and caution situations for Boeing aircraft pilots, the company has carefully prioritized their alert system to reduce distraction and keep pilots focused on the most important issues to keep the plane in the air during emergencies.

Many cybersecurity companies cannot say the same. Often these security vendors will oversimplify the issue and claim to solve alert fatigue, but frequently make it worse. The good news is that these false-positive and alert fatigue problems are neither novel nor unique to our industry.

In this article, I’ll cover what I believe are the main contributing factors to alert fatigue for cybersecurity practitioners, why alert fatigue sucks, and what we can do about it.

Contributing Factors

Alarm fatigue or alert fatigue occurs when one is exposed to a large number of frequent alarms (alerts) and consequently becomes desensitized to them. Desensitization can lead to longer response times or missing important alarms.

https://en.wikipedia.org/wiki/Alarm_fatigue

Technical Causes of Alert Fatigue

Overmatched, misleading or outdated indicator telemetry

Low-fidelity alerts are the most obvious and common contributor to alert fatigue. This results in over-alerting on events with a low probability of being malicious, or matching on activity that is actually benign.

One good example of this is low-quality IP block lists – these lists identify “known-bad IP addresses,” which should be blocked by a firewall or other filtering mechanism. Unfortunately, these lists are often under-curated or completely uncurated output from dynamic malware sandboxes.

Here’s an example of how a “known-good” IP address can get onto a “known-bad” list: A malicious binary being detonated in a sandbox attempts to check for an Internet connection by pinging Google’s public DNS server (8.8.8.8). This connection attempt might get mischaracterized as command-and-control communications, with the IP address incorrectly added to the known-bad list. These lists are then bought and sold by security vendors and bundled with security products that incorrectly label traffic to or from these IP addresses as “malicious.”

Low-fidelity alerts can also be generated when a reputable source releases technical indicators that can be misleading without additional context. Take, for instance, the data accompanying the United States Cybersecurity and Infrastructure Security Agency (CISA)’s otherwise excellent 2016 Grizzly Steppe report. The CSV/STIX files contained a list of 876 IP addresses, including 44 Tor exit nodes and four Yahoo mail servers, which if loaded blindly into a security product, would raise alerts every time the organization’s network attempted to route an email to a Yahoo email address. As Kevin Poulsen noted in his Daily Beast article calling out the authors of the report, “Yahoo servers, the Tor network, and other targets of the DHS list generate reams of legitimate traffic, and an alarm system that’s always ringing is no alarm system at all.”

Another type of a low fidelity alert is the overmatch or over-sensitive heuristic, as seen below:

Alert: “Attack detected from remote IP address 1.2.3.4: IP address detected attempting to brute-force RDP service.”
Reality: A user came back from vacation and got their password wrong three times.

Alert: “Ransomware detected on WIN-FILESERVER-01.”
Reality: The file server ran a scheduled backup job.

Alert: “TLS downgrade attack detected by remote IP address: 5.6.7.8.”
Reality: A user with a very old web browser attempted to use the website.

It can be challenging to security engineering teams to construct correlation and alerting rules that accurately identify attacks without triggering false positives due to overly sensitive criteria.

Legitimate computer programs do weird things

Before I founded GreyNoise, I worked on the research and development team at Endgame, an endpoint security company later acquired by Elastic. One of the most illuminating realizations I had while working on that product was just how many software applications are programmed to do malware-y looking thingsI discovered that tons of popular software applications were shipped with unsigned binaries and kernel drivers, or sketchy-looking software packers and crypters.

These are all examples of a type of supply chain integrity risk, but unlike SolarWinds, which shipped compromised software, these companies are delivering software built using sloppy or negligent software components.

Another discovery I made during my time at Endgame was how common it is for antivirus software to inject code into other processes. In a vacuum, this behavior should (and would) raise all kinds of alerts to a host-based security product. However, upon investigation by an analyst, this was often determined to be expected application behavior: a false positive.

Poor security product UX

For all the talent that security product companies employ in the fields of operating systems, programming, networking, and systems architecture, they often lack skills in user-experience and design. This results in security products often piling on dozens—or even hundreds—of duplicate alert notifications, leaving the user with no choice but to manually click through and dismiss each one. If we think back to the Boeing aviation example at the beginning of this article, security product UIs are often the equivalent of trying to accept 100 alert popup boxes while landing a plane in a strong crosswind at night in a rainstorm. We need to do a better job with human factors and user experience.

Expected network behavior is a moving target

Anomaly detection is a strategy commonly used to identify “badness” in a network. The theory is to establish a baseline of expected network and host behavior, then investigate any unplanned deviations from this baseline. While this strategy makes sense conceptually, corporate networks are filled with users who install all kinds of software products and connect all kinds of devices. Even when hosts are completely locked down and the ability to install software packages is strictly controlled, the IP addresses and domain names with which software regularly communicates fluctuate so frequently that it’s nearly impossible to establish any meaningful or consistent baseline.

There are entire families of security products that employ anomaly detection-based alerting with the promise of “unmatched insight” but often deliver mixed or poor results. This toil ultimately rolls downhill to the analysts, who either open an investigation for every noisy alert or numb themselves to the alerts generated by these products and ignore them. As a matter of fact, a recent survey by Critical Start found that 49% of analysts turn off high-volume alerting features when there are too many alerts to process.

Home networks are now corporate networks

The pandemic has resulted in a “new normal” of everyone working from home and accessing the corporate network remotely. Before the pandemic, some organizations were able to protect themselves by aggressively inspecting north-south traffic coming in and out of the network on the assumption that all intra-company traffic was inside the perimeter and “safe,” Today, however, the entire workforce is outside the perimeter, and aggressive inspection tends to generate alert storms and lots of false positives. If this perimeter-only security model wasn’t dead already, the pandemic has certainly killed it.

Cyberattacks are easier to automate

A decade ago, successfully exploiting a computer system involved a lot of work. The attacker had to profile the target computer system, go through a painstaking process to select the appropriate exploit for the system, account for things like software version, operating system, processor architecture and firewall rules, and evade host- and system-based security products.

In 2020, there are countless automated exploitation and phishing frameworks both open source and commercial. As a result, exploitation of vulnerable systems is now cheaper, easier and requires less operator skill.

Activity formerly considered malicious is being executed at internet-wide scale by security companies

“Attack Surface Management,” a cybersecurity sub-industry, identifies vulnerabilities in their customers’ Internet-facing systems and alerts them of such. This is a good thing, not a bad thing, but the issue is not what these companies do—it’s how they do it.

Most Attack Surface Management companies constantly scan the entire internet to identify systems with known-vulnerabilities, and organize the returned data by vulnerability and network owner. In previous years, an unknown remote system checking for vulnerabilities on a network perimeter was a powerful indicator of an oncoming attack. Now, alerts raised from this activity provide less actionable value to analysts and happen more frequently as more of these companies enter the market.

The internet is really noisy

Hundreds of thousands of devices, malicious and benign, are constantly scanning, crawling, probing, and attacking every single routable IP address on the entire internet for various reasons. The more benign use cases include indexing web content for search engines, searching for malware command-and-control infrastructure, the above-mentioned Attack Surface Management activity, and other internet-scale research. The malicious use cases are similar: take a reliable, common, easy-to-exploit vulnerability, attempt to exploit every single vulnerable host on the entire internet, then inspect the successfully compromised hosts to find accesses to interesting organizations.

At GreyNoise, we refer to the constant barrage of Internet-wide scan and attack traffic that every routable host on the internet sees as “Internet Noise.” This phenomenon causes a significant amount of pointless alerts on internet-facing systems, forcing security analysts to constantly ask “is everyone on the internet seeing this, or just us?” At the end of the day, there’s a lot of this noise: over the past 90 days, GreyNoise has analyzed almost three million IP addresses opportunistically scanning the internet, with 60% identified as benign or unknown, and only 40% identified as malicious.

Non-Technical Causes of Alert Fatigue

Fear sells

An unfortunate reality of human psychology is that we fear things that we do not understand, and there is absolutely no shortage of scary things we do not understand in cybersecurity. It could be a recently discovered zero-day threat, or a state-sponsored hacker group operating from the shadows, or the latest zillion-dollar breach that leaked 100 million customer records. It could even be the news article written about the security operations center that protects municipal government computers from millions of cyberattacks each day. Sales and marketing teams working at emerging cybersecurity product companies know that fear is a strong motivator, and they exploit it to sell products that constantly remind users how good of a job they’re doing.

And nothing justifies a million-dollar product renewal quite like security “eye candy,” whether it’s a slick web interface containing a red circle with an ever-incrementing number showing the amount of detected and blocked threats, or a 3D rotating globe showing “suspicious” traffic flying in to attack targets from many different geographies. The more red that appears in the UI, the scarier the environment, and the more you need their solution. Despite the fact that these numbers often serve as “vanity metrics” to justify product purchases and renewals, many of these alerts also require further review and investigation by the already overworked and exhausted security operations team.

The stakes are high

Analysts are under enormous pressure to identify cyberattacks targeting their organization, and stop them before they turn into breaches. They know they are the last line of defense against cyber threats, and there are numerous stories about SOC analysts being fired for missing alerts that turn into data breaches.

In this environment, analysts are always worried about what they missed or what they failed to notice in the logs, or maybe they’ve tuned their environment to the point where they can no longer see all of the alerts (yikes!). It’s not surprising that analyst worry of missing an incident has increased. A recent survey by FireEye called this “Fear of Missing Incidents” (FOMI). They found that three in four analysts are worried about missing incidents, and one in four worry “a lot” about missing incidents. The same goes for their supervisors – more than six percent of security managers reported losing sleep due to fear of missing incidents.

Is it any wonder that security analysts exhibit serious alert fatigue and burnout, and that SOCs have extremely high turnover rates?

Everything is a single pane of glass

Security product companies love touting a “single pane of glass” for complete situational awareness. This is a noble undertaking, but the problem is that most security products are really only good at a few core use cases and then trend towards mediocrity as they bolt on more features. At some point, when an organization has surpassed twenty “single panes of glass,” the problem has become worse.

More security products are devoted to “preventing the bad thing” than “making the day to day more efficient”

There are countless security products that generate new alerts and few security products that curate, deconflict or reduce existing alerts. There are almost no companies devoted to reducing drag for Security Operations teams. Too many products measure their value by their customers’ ability to alert on or prevent something bad, and not by making existing, day-to-day security operations faster and more efficient.

Product pricing models are attached to alert/event volume

Like any company, security product vendors are profit-driven. Many product companies are heavily investor-backed and have large revenue expectations. As such, Business Development and Sales teams often price products with scaling or tiered pricing models based on usage-oriented metrics like gigabytes of data ingested or number of alerts raised. The idea is that, as customers adopt and find success with these products, they will naturally increase usage, and the vendor will see organic revenue growth as a result.

This pricing strategy is often necessary when the cost of goods sold increases with heavier usage, like when a server needs additional disk storage or processing power to continue providing service to the customer.

But an unfortunate side effect of this pricing approach is that it creates an artificial vested interest in raising as many alerts or storing as much data as possible. And it reduces the incentive to build the capabilities for the customer to filter and reduce this “noisy” data or these tactically useless alerts.

If the vendor’s bottom line depends on as much data being presented to the user as possible, then they have little incentive to create intelligent filtering options. As a result, these products will continue to firehose analysts, further perpetuating alert fatigue.

False positives drive tremendous duplication of effort

Every day, something weird happens on a corporate network and some security product raises an alert to a security analyst. The alert is investigated for some non-zero amount of time, is determined to be a false positive caused by some legitimate application functionality, and is dismissed. The information on the incident is logged somewhere deep within a ticketing system and the analyst moves on.

The implications of this are significant. This single security product (or threat intelligence feed) raises the same time-consuming false-positive alert on every corporate network where it is deployed around the world when it sees this legitimate application functionality. Depending on the application, the duplication of effort could be quite staggering. For example, for a security solution deployed across 1000 organizations, an event generated from unknown network communications that turns out to be a new Office 365 IP address could generate 500 or more false positives. If each takes 5 minutes to resolve, that adds up to a full week of effort.

Nobody collaborates on false positives

Traditional threat intelligence vendors only share information about known malicious software. Intelligence sharing organizations like Information Sharing and Analysis Centers (ISACs), mailing lists, and trust groups have a similar focus. None of these sources of threat intelligence focus on sharing information related to confirmed false-positive results, which would aid others in quickly resolving unnecessary alerts. Put another way: there are entire groups devoted to reducing the effectiveness of a specific piece of malware or threat actor between disparate organizations. However, no group supports identifying cases when a benign piece of software raises a false positive in a security product.

Security products are still chosen by the executive, not the user

This isn’t unusual. It is a vestige of the old days. Technology executives maintain relationships with vendors, resellers and distributors. They go to a new company and buy the products they are used to and with which they’ve had positive experiences.

Technologies like Slack, Dropbox, Datadog, and other user-first technology product companies disrupted and dominated their markets quickly because they allowed enterprise prospects to use their products for free. They won over these prospects with superior usability and functionality, allowing users to be more efficient. While many technology segments have adopted this “product-led” revolution, it hasn’t happened in security yet, so many practitioners are stuck using products they find inefficient and clunky.

Why You Should Care

The pain of alert fatigue can manifest in several ways:

  1. Death (or burnout) by a thousand cuts, leading to stress and high turnover
  2. Lack of financial return to the organization
  3. Compromises or breaches missed by the security team

There is a “death spiral” pattern to the problem of alert fatigue: at its first level, analysts spend more and more time reviewing and investigating alerts that provide diminishing value to the organization. Additional security products or feeds are purchased that generate more “noise” and false positives, increasing the pressure on analysts. The increased volume of alerts from noisy security products cause the SOC to need a larger team, with the SOC manager trying to grow a highly skilled team of experts while many of them are overwhelmed, burned out, and at risk of leaving.

From the financial side of things, analyst hours spent investigating pointless alerts are a complete waste of security budget. The time and money spent on noisy alerts and false positives is often badly needed in other areas of the security organization to support new tools and resources. Security executives face a difficult challenge in cost justifying the investment of good analysts being fed bad data.

And worst of all, alert fatigue contributes to missed threats and data breaches. In terms of human factors, alert fatigue can create a negative mindset leading to rushing, frustration, mind not on the task, or complacency. As I noted earlier, almost 50% of analysts who are overwhelmed will simply turn off the noisy alert sources. All of this contributes to an environment where threats are more easily able to sneak through an organization’s defenses.

What can we do about it?

The analyst

Get to “No” faster. To some extent, analysts are the victim of the security infrastructure in their SOC. The part of the equation they control is their ability to triage alerts quickly and effectively. So from a pragmatic viewpoint, find ways to use analyst expertise and time as effectively as possible. In particular, find tools and resources that helps you to rule out alerts as fast as possible.

The SOC manager

Tune your alerts. There is significant positive ROI value to investing in tuning, diverting, and reducing your alerts. Tune your alerts to reduce over-alerting. Leverage your Purple Team to assist and validate your alert “sensitivity.” Focus on the critical TTPs of threat actors your organization faces, and audit your attack surface and automatically filter out what doesn’t matter. These kinds of actions can take a tremendous load off your analyst teams and help them focus on the things that do matter.

The CISO

More is not always better. Analysts are scarce, valuable resources. They should be used to investigate the toughest, most sophisticated threats, so use the proper criteria for evaluating potential products and intelligence feeds, and make sure you understand the potential negatives (false positives, over-alerting) as well as the positives. Be skeptical when you hear about a single pane of glass. And focus on automation to resolve as many of the “noise” alerts as possible.

Security vendors

Focus on the user experience. Security product companies need to accept the reality that they cannot solve all of their users’ security problems unilaterally, and think about the overall analyst experience. Part of this includes treating integrations as first-class citizens, and deprioritizing dashboards. If everything is a single pane of glass, nothing is a single pane of glass—this is no different than the adage that “if everyone is in charge, then no one is in charge.” Many important lessons can be learned from others who have addressed UI/UX issues associated with alert fatigue, such as healthcare and aviation.

The industry

More innovation is needed. The cybersecurity industry is filled with some of the smartest people in the world, but lately we’ve been bringing a knife to a gunfight. The bad guys are scaling their attacks tremendously via automation, dark marketplaces, and advanced technologies like artificial intelligence and machine learning. The good guys have been spending all their time in a painfully fragmented and broken security environment, with all their time focused on identifying the signal, and none on reducing the noise. This has left analysts struggling to manually muscle through overwhelming volumes of alerts. We need some security’s best and brightest to turn their amazing brains to the problem of reducing the noise in the system, and drive innovation that helps analysts focus on what matters the most.

Conclusion

Primary care clinicians became less likely to accept alerts as they received more of them, particularly as they received more repeated (and therefore probably uninformative) alerts.

–  Ancker, et al.

Our current approach to security alerts, requiring analysts to process ever-growing volumes, just doesn’t scale, and security analysts are paying the price with alert fatigue, burnout, and high turnover. I’ve identified a number of the drivers of this problem, and our next job is to figure out how to solve it. One great area to start is to figure out how other industries have improved their approach, with aviation being a good potential model. With some of these insights in mind, we can figure out how to do better in our security efforts by doing less.

Andrew Morris
Founder of GreyNoise

WHITEPAPER | May 17, 2021

Cross-Platform Feature Comparison

For an Intel-commissioned study, IOActive compared security-related technologies from both the 11th Gen Intel Core vPro mobile processors and the AMD Ryzen PRO 4000 series mobile processors, as well as highlights from current academic research where applicable.

Our comparison was based on a set of objectives bundled into five categories: Below the OS, Platform Update, Trusted Execution, Advanced Threat Protection, and Crypto Extension. Based on IOActive research, we conclude that AMD offers no corresponding technologies those categories while Intel offers features; Intel and AMD have equivalent capabilities in the Trusted Execution category.

EDITORIAL | April 8, 2021

Trivial Vulnerabilities, Serious Risks

Introduction

The digital transformation brought about by the social distancing and isolation caused by the global COVID-19 pandemic was both extremely rapid and unexpected. From shortening the distance to our loved ones to reengineering entire business models, we’re adopting and scaling new solutions that are as fast-evolving as they are complex. The full impact of the decisions and technological shifts we’ve made in such short a time will take us years to fully comprehend.

Unfortunately, there’s a darker side to this rapid innovation and growth which is often performed to strict deadlines and without sufficient planning or oversight – over the past year, cyberattacks have increased drastically worldwide [1]. Ransomware attacks rose 40% to 199.7 million cases globally in Q3 alone [2], and 2020 became the “worst year on record” for data breaches by the end of Q2 [1].

In 2020, the U.S. government suffered a series of attacks targeting several institutions, including security agencies, the Congress, and the judiciary, combining in what was arguably the “worst-ever US government cyberattack,” and also affecting major tech companies.

The attacks were reported in detail [3], bringing attention to the mass media [4]. A recent article by Kari Paul and Lois Beckett in The Guardian stated[5]:

“Key federal agencies, from the Department of Homeland Security to the agency that oversees America’s nuclear weapons arsenal, were reportedly targeted, as were powerful tech and security companies including Microsoft. Investigators are still trying to determine what information the hackers may have stolen, and what they could do with it.”

In November of last year, the Brazilian judicial system faced its own personal chapter of this story. The Superior Court of Justice, the second-highest of Brazil’s courts, had over 1,000 servers taken over and backups destroyed in a ransomware attack [6]. As a result of the ensuing chaos, their infrastructure was down for about a week.

Adding insult to injury, shortly afterward, Brazil’s Superior Electoral Court also suffered a cyberattack that threatened and delayed recent elections [7].

In this post, we will briefly revisit key shifts in cyberattack and defense mechanisms that followed the technological evolution of the past several decades. Even after a series of innovations and enhancements in the field, we will illustrate how simple security issues still pose major threats today, and certainly will tomorrow.

We will conclude by presenting a cautionary case study [25] of the trivial vulnerability that could have devastated the Brazilian Judicial System.

The Ever-changing ROI of Cyberattacks

Different forms of intrusion technology have come into and out of vogue with attackers over the decades since security threats have been in the public consciousness.

In the 1980s, default logins and guest accounts gave attackers carte blanche access to systems across the globe. In the 1990s and early 2000s, plentiful pre-authentication buffer overflows could be found everywhere.

Infrastructure was designed flatly then, with little compartmentalization in mind, leaving computers — clients and servers — vastly exposed to the Internet. With no ASLR [8] or DEP/NX [9] insight, exploiting Internet-shattering vulnerabilities was a matter of a few hours or days of work — access was rarely hard to obtain for those who wanted it.

In the 2000s, things started to change. The rise of the security industry, Bill Gates’ famous 2002 memo, [10] and the growing full-disclosure movement leading the charge in the appropriate regulation of vulnerabilities, brought with them a full stack of security practices covering everything from software design and development to deployment and testing.

By 2010, security assessments, red-teaming exercises, and advanced protection mechanisms were common standards among developed industries. Nevertheless, zero-day exploits were still widely used for both targeted and mass attacks.

Between 2010 and 2015, non-native web applications and virtualized solutions multiplied. Over the following years, as increasing computing power permitted, hardware was built with robust chain-of-trust [11], virtualization [12], and access control capabilities. Software adopted strongly typed languages, with verification and validation [13] a part of code generation and runtime procedures. Network technologies were designed to support a variety of solutions for segregation and orchestration, with fine-grained controls.

From 2015 onwards, applications were increasingly deployed in decentralized infrastructures, along with ubiquitous web applications and services, and the Cloud started to take shape. Distributed multifactor authentication and authorization models were created to support users and components of these platforms.

These technological and cultural shifts conveyed changes to the mechanics of zero-day-based cyberattacks.

Today at IOActive, we frequently find complex, critical security issues in our clients’ products and systems. However, turning many of those bugs into reliable exploits can take massive amounts of effort. Most of the time, a start-to-end compromise would depend on entire chains of vulnerabilities to support a single attack.

In parallel to the past two decades of security advancements, cyberattacks adapted and evolved alongside them in what many observers compare to a “cyber-arms race,” with scopes on the private and government sectors.

While the major players in cyber warfare have virtually unlimited resources, for the majority of mid-tier cyber-attackers the price of such over-engineering simply doesn’t pay for itself. With better windows of opportunity elsewhere, attackers are instead increasingly relying on phishing, data breaches, asset exposures, and other relatively low-tech intrusion methods.

Simple Issues, Serious Threats: Today and Tomorrow

Technologies and Practices Today

While complex software vulnerabilities remain a threat today, increasingly devastating attacks are being leveraged from simple security issues. The reasons for this can vary, but it often results from recently adopted technologies and practices:

  • Cloud services [14] becoming sine qua non make it hard to track assets, content and controls [15] in the overly agile DevOps lifecycle
  • Third-party chains-of-trust become weaker as they grow (we’ve recently seen a code-dependency-critical attack based on typosquatting) [16]
  • Weak MFA mechanisms based on telephony, SMS, and instant messengers leveraging identity theft and authentication bypasses
  • Collaborative development via public repositories often leak API keys and other secrets by mistake
  • Interconnected platforms create an ever-growing supply-chain complex that must be validated across multiple vendors

New Technologies and Practices Tomorrow

Tomorrow should bring interesting new shades to this watercolor landscape:

What they didn’t tell you about AI (Thanks @mbsuiche)

Old Technologies and Practices Today and Tomorrow

There is another factor contributing to the method, from which simple security issues continue to present major threats today and will so tomorrow. It echoes silently from a past where security scrutiny wasn’t praxis.

Large governmental, medical, financial, and industrial control systems all have one thing in common: they’re a large stack of interconnected components. Many of these are either legacy components or making use of ancient technologies that lack minimal security controls.

A series of problems face overstretched development teams who often need to be overly agile and develop “full stack” applications: poor SDLC, regression bugs, lack of unit tests, and short deadlines all contribute to code with simple and easily exploitable bugs that make it into production environments. Although tracking and sanitizing such systems can be challenging to industries and governments, a minor mistake can cause real disasters.

Case Study [view the file here]

The Brazilian National Justice Council (CNJ) maintains a Judicial Data Processing System capable of facilitating the procedural activities of magistrates, judges, lawyers, and other participants in the Brazilian legal system with a single platform, making it ubiquitous as a result.

The CNJ Processo Judicial Eletrônico (CNJ PJe) system processes judicial data, with the objective of fulfilling the needs of the organs of the Brazilian Judiciary Power: Superior, Military, Labor, and Electoral courts; the courts of both the Federal Union and the individual states themselves; and the specialized justice systems that handle ordinary law and employment tribunals on both the federal and state level.

The CNJ PJeOffice software allows access to a user’s workspace through digital certificates, where individuals are provided with specific permissions, access controls, and scope of access in accordance with their roles. The primary purpose of this application is to guarantee legal authenticity and integrity to documents and processes through digital signatures.

Read the IOActive case study of the CNJ PJe vulnerabilities that fully details the scenario that represented big risks for the Brazilian Judicial System users.

Conclusion

While Information Security has strongly evolved over the past several decades, creating solid engineering, processual, and cultural solutions, new directions in the way we depend upon and use technology will come with issues that are not necessarily new or complex.

Despite their simplicity, attacks arising from these issues can have a devastating impact.

How people work and socialize, the way businesses are structured and operated, even ordinary daily activities are changing, and there’s no going back. The post-COVID-19 world is yet to be known.

Apart from the undeniable scars and changes that year 2020 imposed on our lives, one certainty is assured: information security has never been so critical.

References

[4] https://apnews.com/article/coronavirus-pandemic-courts-russia-375942a439bee4f4b25f393224d3d778

[5] https://www.theguardian.com/technology/2020/dec/18/orion-hack-solarwinds-explainer-us-government

[6] https://www.theregister.com/2020/11/06/brazil_court_ransomware/

[7] https://www.tse.jus.br/imprensa/noticias-tse/2020/Novembro/tentativas-de-ataques-de-hackers-ao-sistema-do-tse-nao-afetaram-resultados-das-eleicoes-afirma-barroso

[8] https://en.wikipedia.org/wiki/Address_space_layout_randomization

[9] https://en.wikipedia.org/wiki/Executable_space_protection

[10] https://www.wired.com/2002/01/bill-gates-trustworthy-computing/

[11] https://en.wikipedia.org/wiki/Trusted_Execution_Technology

[12] https://en.wikipedia.org/wiki/Hypervisor

[13] https://en.wikipedia.org/wiki/Software_verification_and_validation

[14] https://cloudsecurityalliance.org/blog/2020/02/18/cloud-security-challenges-in-2020/

[15] https://ioactive.com/guest-blog-docker-hub-scanner-matias-sequeira/

[16] https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

[17] https://act-on.ioactive.com/acton/attachment/34793/f-87b45f5f-f181-44fc-82a8-8e53c501dc4e/0/-/-/-/-/LoRaWAN%20Networks%20Susceptible%20to%20Hacking.pdf

[18] https://act-on.ioactive.com/acton/fs/blocks/showLandingPage/a/34793/p/p-003e/t/form/fm/0/r//s/?ao_gatedpage=p-003e&ao_gatedasset=f-2c315f60-e9a2-4042-8201-347d9766b936

[19] https://ioactive.com/wp-content/uploads/2018/05/IOActive_HackingCitiesPaper_cyber-security_CesarCerrudo-1.pdf

[20] https://ioactive.com/wp-content/uploads/2018/05/Hacking-Robots-Before-Skynet-Paper_Final.pdf

[21] https://ioactive.com/wp-content/uploads/2018/05/IOActive_Compromising_Industrial_Facilities_from_40_Miles_Away.pdf

[22] https://ioactive.com/pdfs/IOActive_SATCOM_Security_WhitePaper.pdf

[23] https://ioactive.com/wp-content/uploads/2018/08/us-18-Santamarta-Last-Call-For-Satcom-Security-wp.pdf

[24] https://www.belfercenter.org/publication/AttackingAI

[25] https://act-on.ioactive.com/acton/attachment/34793/f-e426e414-e895-4fb0-971f-4fa432e5ad9b/1/-/-/-/-/IOA-casestudy-CNJ-PJe.pdf