INSIGHTS | March 26, 2014

A Bigger Stick To Reduce Data Breaches

On average I receive a postal letter from a bank or retailer every two months telling me that I’ve become the unfortunate victim of a data theft or that my credit card is being re-issued to prevent against future fraud. When I quiz my friends and colleagues on the topic, it would seem that they too suffer the same fate on a reoccurring schedule. It may not be that surprising to some folks. 2013 saw over 822 million private records exposed according to the folks over at DatalossDB – and that’s just the ones that were disclosed publicly.

It’s clear to me that something is broken and it’s only getting worse. When it comes to the collection of personal data, too many organizations have a finger in the pie and are ill equipped (or prepared) to protect it. In fact I’d question why they’re collecting it in the first place. All too often these organizations – of which I’m supposedly a customer – are collecting personal data about “my experience” doing business with them and are hoping to figure out how to use it to their profit (effectively turning me in to a product). If these corporations were some bloke visiting a psychologist, they’d be diagnosed with a hoarding disorder. For example, consider what criteria the DSM-5 diagnostic manual uses to identify the disorder:

  • Persistent difficulty discarding or parting with possessions, regardless of the value others may attribute to these possessions.
  • This difficulty is due to strong urges to save items and/or distress associated with discarding.
  • The symptoms result in the accumulation of a large number of possessions that fill up and clutter active living areas of the home or workplace to the extent that their intended use is no longer possible.
  • The symptoms cause clinically significant distress or impairment in social, occupational, or other important areas of functioning.
  • The hoarding symptoms are not due to a general medical condition.
  • The hoarding symptoms are not restricted to the symptoms of another mental disorder.

Whether or not the organizations hording personal data know how to profit from it or not, it’s clear that even the biggest of them are increasingly inept at protecting it. The criminals that are pilfering the data certainly know what they’re doing. The gray market for identity laundering has expanded phenomenonly since I talked about at Blackhat in 2010.

We can moan all we like about the state of the situation now, but we’ll be crying in the not too distant future when statistically we progress from being a victim to data loss, to being a victim of (unrecoverable) fraud.

The way I see it, there are two core components to dealing with the spiraling problem of data breaches and the disclosure of personal information. We must deal with the “what data are you collecting and why?” questions, and incentivize corporations to take much more care protecting the personal data they’ve been entrusted with.

I feel that the data hording problem can be dealt with fairly easily. At the end of the day it’s about transparency and the ability to “opt out”. If I was to choose a role model for making a sizable fraction of this threat go away, I’d look to the basic component of the UK’s Data Protection Act as being the cornerstone of a solution – especially here in the US. I believe the key components of personal data collection should encompass the following:

  • Any organization that wants to collect personal data must have a clearly identified “Data Protection Officer” who not only is a member of the executive board, but is personally responsible for any legal consequences of personal data abuse or data breaches.
  • Before data can be collected, the details of the data sought for collection, how that data is to be used, how long it would be retained, and who it is going to be used by, must be submitted for review to a government or legal authority. I.e. some third-party entity capable of saying this is acceptable use – a bit like the ethics boards used for medical research etc.
  • The specifics of what data a corporation collects and what they use that data for must be publicly visible. Something similar to the nutrition labels found on packaged foods would likely be appropriate – so the end consumer can rapidly discern how their private data is being used.
  • Any data being acquired must include a date of when it will be automatically deleted and removed.
  • At any time any person can request a copy of any and all personal data held by a company about themselves.
  • At any time any person can request the immediate deletion and removal of all data held by a company about themselves.

If such governance existed for the collection and use of personal data, then the remaining big item is enforcement. You’d hope that the morality and ethics of corporations would be enough to ensure they protected the data entrusted to them with the vigor necessary to fight off the vast majority of hackers and organized crime, but this is the real world. Apparently the “big stick” approach needs to be reinforced.

A few months ago I delved in to how the fines being levied against organizations that had been remiss in doing all they could to protect their customer’s personal data should be bigger and divvied up. Essentially I’d argue that half of the fine should be pumped back in to the breached organization and used for increasing their security posture.

Looking at the fines being imposed upon the larger organizations (that could have easily invested more in protecting their customers data prior to their breaches), the amounts are laughable. No noticeable financial pain occurs, so why should we be surprised if (and when) it happens again. I’ve become a firm believer that the fines businesses incur should be based upon a percentage of valuation. Why should a twenty-billion-dollar business face the same fine for losing 200,000,000 personal records as a ten-million-dollar business does for losing 50,000 personal records? If the fine was something like two-percent of valuation, I can tell you that the leadership of both companies would focus more firmly on the task of keeping yours and mine data much safer than they do today. 

INSIGHTS | February 27, 2014

Beware Your RSA Mobile App Download

It’s been half a decade since Apple launched their iPhone campaign titled “There’s an app for that“. In the years following, the mobile app stores (from all the major players) have continued to blossom to the point that not only are there several thousand apps that help light your way (i.e. by keeping the flash running bright), but every company, cause, group, or notable event is expected to publish their own mobile application. 
 
Today there are several hundred good “rapid development” kits that allow any newbie to craft and release their own mobile application and several thousand small professional software development teams that will create one on your behalf. These bespoke mobile applications aren’t the types products that their owners are expecting to make much (if any) money off of. Instead, these apps are generally helpful tools that appeal to a particular target audience.
 
Now, while the cynical side of me would like to point out that some people should never be trusted with tools as lofty as HTML and setting up WordPress sites–let alone building a mobile app, many corporate marketing teams I’ve dealt with have not only drunk the “There’s an app for that” Kool-Aid, they appear to bath in the stuff each night. As such, a turnkey approach to app production is destined to involve many sacrifices and, at the top of the sacrificial pillar, data security and integrity continue to reign supreme.
 
A few weeks ago I noticed that, in the run up to the RSA USA 2014 conference, a new mobile application was conceived and thrust upon the Apple and Google app stores and electronically marketed to the world at large. Maybe it was a reaction to being spammed with a never-ending tirade of “come see us at RSA” emails, or it was topical off the back of a recent blog on the state of mobile banking application security, or maybe both. I asked some of the IOActive consulting team who had a little bench-time between jobs to have a poke at freshly minted “RSA Conference 2014” mobile application. 
 
 
 
The Google Play app store describes the RSA Conference 2014 application like this:
With the RSA Conference Mobile App, you can stay connected with all Conference activities, view the event catalog, manage session schedules and engage with colleagues and peers while onsite using our social and professional networking tools. You’ll have access to dynamic agenda updates, venue maps, exhibitor listing and more!
Now, I wasn’t expecting the application to be particularly interesting–it’s not as if it was a transactional banking application etc.–but I would have thought that RSA (or whoever they tasked with commissioning the application) would have at least applied some basic elbow grease so as to not potentially embarrass themselves. Alas, that was not to be the case.
 
The team came back rather quickly with a half-dozen security issues. Technically the highest impact vulnerability had to do with the app being vulnerable to man-in-the-middle attacks, where an attacker could inject additional code into the login sequence and phish credentials. If we were dealing with a banking application, then heads would have been rolling in an engineering department, but this particular app has only been downloaded a few thousand times, and I seriously doubt that some evil hacker is going to take the time out of their day to target this one application (out of tens-of-millions) to try phish credentials to a conference.
 
It was the second most severe vulnerability that caught my eye though. The RSA Conference 2014 application downloads a SQLite DB file that is used to populate the visual portions of the app (such as schedules and speaker information) but, for some bizarre reason, it also contains information of every registered user of the application–including their name, surname, title, employer, and nationality.
 
 
 
I have no idea why the app developers chose to do that, but I’m pretty sure that the folks who downloaded and installed the application are unlikely to have thought that their details were being made public and published in this way. Marketers love this kind of information though!
 
Some readers may think I’m targeting RSA, and in a small way I guess I am. Security flaws in mobile applications (particularly these rapidly developed and targeted apps) are endemic, and I think the RSA example helps prove the point that there are often inherent risks in even the most benign applications.
 
I’m betting that RSA didn’t even create the application themselves. The Google Play store indicates that a company called QuickMobile was the developer. With one small click it’s possible to get a list of all the other applications QuickMobile have created for what I would assume to be on their clients behalf.
 
 
 
As you can see from above, there are lots of popular brands and industry conferences employing their app creation services. I wonder if many of them share the same vulnerabilities as the RSA Conference 2014 application?
 
Here’s a little bit of advice to any corporate marketing team. If you’re going to release your own mobile application, the security and integrity of that application are your responsibility. While you can’t outsource that, you can get another organization to assess the application on your behalf.
 
In the meantime, readers of this blog may want to refrain from downloading the RSA Conference 2014 (and related) mobile applications–unless you’re a hacker or marketing team that wants to acquire a free list of conference attendees names, positions, and employers.
INSIGHTS | August 20, 2013

FDA Medical Device Guidance

Last week the US Food and Drug Administration (FDA) finally released a couple of important documents. The first being their guidance on using radio frequency wireless technology in medical devices (replacing a draft from January 3,2007), and a second being their new (draft) guidance on premarket submission for management of cybersecurity in medical devices.

The wireless technology guidance document seeks to address many of the risks and vulnerabilities that have been disclosed in medical devices (embedded or otherwise) in recent years – in particular those with embedded RF wireless functionality…

The recommendations in this guidance are intended for RF wireless medical devices including those that are implanted, worn on the body or other external wireless medical devices intended for use in hospitals, homes, clinics, clinical laboratories, and blood establishments.  Both wireless induction-based devices and radiated RF technology device systems are within the scope of this guidance.

The FDA wishes medical device manufacturers to consider the design, testing and use of wireless medical devices…

In the design, testing, and use of wireless medical devices, the correct, timely, and secure transmission of medical data and information is important for the safe and effective use of both wired and wireless medical devices and device systems. This is especially important for medical devices that perform critical functions such as those that are life-supporting or life-sustaining. For wirelessly enabled medical devices, risk management should include considerations for robust RF wireless design, testing, deployment, and maintenance throughout the life cycle of the product.

For most of you reading the IOActive Labs blog, the most important parts of the guidance document are the advice on security and securing “wireless signals and data”. Section 3.d covers this…

Security of RF wireless technology is a means to prevent unauthorized access to patient data or hospital networks and to ensure that information and data received by a device are intended for that device. Authentication and wireless encryption play vital roles in an effective wireless security scheme. While most wireless technologies have encryption schemes available, wireless encryption might need to be enabled and assessed for adequacy for the medical device’s intended use. In addition, the security measures should be well coordinated among the medical device components, accessories, and system, and as needed, with a host wireless network. Security management should also consider that certain wireless technologies incorporate sensing of like technologies and attempt to make automatic connections to quickly assemble and use a network (e.g., a discovery mode such as that available in Bluetooth™ communications). For certain types of wireless medical devices, this kind of discovery mode could pose safety and effectiveness concerns, for example, where automatic connections might allow unintended remote control of the medical device. 

FDA recommends that wireless medical devices utilize wireless protection (e.g., wireless encryption,6 data access controls, secrecy of the “keys” used to secure messages) at a level appropriate for the risks presented by the medical device, its environment of use, the type and probability of the risks to which it is exposed, and the probable risks to patients from a security breach. FDA recommends that the following factors be considered during your device design and development: 

* Protection against unauthorized wireless access to device data and control. This should include protocols that maintain the security of the communications while avoiding known shortcomings of existing older protocols (such as Wired Equivalent Privacy (WEP)). 

* Software protections for control of the wireless data transmission and protection against unauthorized access. 

Use of the latest up-to-date wireless encryption is encouraged. Any potential issues should be addressed either through appropriate justification of the risks based on your device’s intended use or through appropriate design verification and validation.

Based upon the parts I’ve highlighted above, you’ll probably be feeling a little foreboding. From a “guidance” perspective, it’s less useful than a teenager with a CISSP qualification. The instructions are so general as to be useless.

If I was the geek charged with waving the security batton at some medical device manufacturer I wouldn’t be happy at all. Effectively the FDA are saying “there are a number of security risks with wireless technologies, here are some things you could think about doing, hope that helps.” Even if you followed all this advice, the FDA could turn around later during your submission for certification and say you did it wrong…

The second document the FDA released last week (Content of Premarket Submissions for Management of Cybersecurity in Medical Devices – Draft Guidance for Industry and Food and Drug Administration Staff) is a little more helpful – at the very least they’re talking about “cybersecurity” and there’s a little more meat for your CISSP folks to chew upon (in fact parts of it read like they’ve been copy-pasted right out of a CISSP training manual).

This guidance has been developed by the FDA to assist industry by identifying issues related to cybersecurity that manufacturers should consider in preparing premarket submissions for medical devices. The need for effective cybersecurity to assure medical device functionality has become more important with the increasing use of wireless, Internet- and network-connected devices, and the frequent electronic exchange of medical device-related health information.

Again, it doesn’t go in to any real detail of what device manufacturers should or shouldn’t be doing, but it does set the scene for understanding the scope of part of the threat.

If I was an executive at one of the medical device manufacturers confronted with these FDA Guidance documents for the first time, I wouldn’t feel particularly comforted by them – in fact I’d be more worried about the increased exposure I would have in the future. If a future product of mine was to get hacked, regardless of how close I thought I was following the FDA guidance, I’d be pretty sure that the FDA could turn around and say that I wasn’t really in compliance.

With that in mind, let me slip on my IOActive CTO hat and clearly state that I’d recommend any medical device manufacturer that doesn’t want to get bitten in the future for failing to follow this FDA “guidance” reach out to a qualified security consulting company to get advice on (and to assess) the security of current and future product lines prior to release.

Engaging with a bunch of third-party experts isn’t just a CYA proposition for your company. Bringing to bear an external (impartial) security authority would obviously add extra weight to the approval process; proving the companies technical diligence, and working “above and beyond” the security checkbox of the FDA guidelines. Just as importantly though, securing wireless technologies against today’s and tomorrow’s threats isn’t something that can be done by an internal team (or a flock of CISSP’s) – you really do need to call in the experts with a hackers-eye for security… Ideally a company with a pedigree in cutting-edge security research, and I know just who to call…

INSIGHTS | June 20, 2013

FDA Safety Communication for Medical Devices

The US Food and Drug Agency (FDA) released an important safety communication targeted at medical device manufacturers, hospitals, medical device user facilities, health care IT and procurements staff, along with biomedical engineers in which they warn of risk of failure due to cyberattack – such as through malware or unauthorized access to configuration settings in medical devices and hospital networks.
Have you ever been to view a much anticipated movie based upon an exciting book you happened to have read when you were younger, only to be sorely disappointed by what the director finally pulled together on the big screen? Well that’s how I feel when I read this newest alert from the FDA. Actually it’s not even called an alert… it’s a “Safety Communication”… it’s analogous to Peter Jackson deciding that his own interpretation of JRR Tolkien’s ‘The Hobbit’ wasn’t really worthy of the title so to forestall criticism he named the movie ‘Some Dwarves and a Hobbit do Stuff’.
This particular alert (and I’m calling it an alert because I can’t lower myself to call it a safety communication any longer) is a long time coming. Almost a decade ago me and my teams at the time raised the red flag over the woeful security of hospital networks, then back in 2005 my then research teams raised new red flags related to the encroachment of unsecured WiFi in to medical equipment, for the last couple of years IOActive’s research team have been raising new red flags over the absence of security within implantable medical devices, and then on June 13th 2013 the FDA releases a much watered down alert where the primary recommendations and actions section simply states “[m]any medical devices contain configurable embedded computer systems that can be vulnerable to cybersecurity breaches”. It’s as if the hobbit has been interpreted as a midget with hairy feet.
Yes I joke a little, but I am very disappointed with the status of this alert covering an important topic.
The vulnerabilities being uncovered on a daily basis within hospital networks, medical equipment and implantable devices by professional security teams and researchers are generally more serious than what outsiders give credit. Much of the public cybersecurity discussion as it relates to the medical field to date has been about people hacking hospital data systems for patient records and, most recently, the threat of targeted slayings of people who happen to have vulnerable implanted insulin pumps and heart defibrillators. While both are certainly possible, they’re what I would associate with fringe events.
I believe that the biggest and most likely threats lie in non-malicious actors – the tinkerers, the cyber-crooks, and the “in the wrong place at the wrong time” events. These medical systems are so brittle that even the slightest knock or tire-kicking can cause them to fail. I’ll give you some examples:
  • Wireless heart and drug monitoring stations within emergency wards that have open WiFi connections; where anyone with an iPhone searching for an Internet connection can make an unauthenticated connection and have their web browser bring up the admin portal of the station.
  • Remote surgeon support and web camera interfaces used for emergency operations brought down by everyday botnet malware because someone happened to surf the web one day and hit the wrong site.
  • Internet auditing and scanning services run internationally and encountering medical devices connected directly to the Internet through routable IP addresses – being used as drop-boxes for file sharing groups (oblivious to the fact that it’s a medical device under their control).
  • Common WiFi and Bluetooth auditing tools (available for android smartphones and tablets) identifying medical devices during simple “war driving” exercises and leaving the discovered devices in a hung state.
  • Medial staff’s iPads without authentication or GeoIP-locking of hospital applications that “go missing” or are borrowed by kids and have applications (and games) installed from vendor markets that conflict with the use of the authorized applications.
  • NFC from smartphone’s and payment systems that can record, playback and interfere with the communications of implanted medical devices.
These are really just the day-to-day noise of an Internet connected life – but one that much of the medical industry is currently ill prepared to defend against. Against an experienced attacker or someone determined to cause harm – well, it’s as one sided as a lone hobbit versus the combined armies of Middle Earth.
I will give the alert some credit though, that did clarify a rather important point that may have been a stumbling block for many device vendors in the past:
“The FDA typically does not need to review or approve medical device software changes made solely to strengthen cybersecurity.”
IOActive’s experience when dealing with a multitude of vulnerable medical device manufacturers had often been disheartening in the past. A handful of manufacturers have made great strides in securing their devices and controlling software recently – and there has been a change in the hearts and minds over the last 6 months (pun intended) as more publicity has been drawn to the topic. The medical clients we’ve been working most closely with over recent months have made huge leaps in making their latest devices more secure, and their next generation of devices will be setting the standard for the industry for years to come.
In the meantime though, there’s a tremendous amount of work to be done. The FDA’s alert is significant. It is a formal recognition of the poor state of security within the industry – providing some preliminary guidance. It’s just not quite a call to arms I’d have liked to see after so many years – but I guess they don’t want to raise too much fear, nor the ire of vendors that could face long and costly FDA re‑evaluations of their technologies. Gandalf would be disappointed.
(BTW I actually liked Peter Jackson’s rendition of The Hobbit).
INSIGHTS | May 29, 2013

Security 101: Machine Learning and Big Data

The other week I was invited to keynote at the ISSA CISO Forum on Incident Response in Dallas and in the weeks prior to it I was struggling to decide upon what angle I should take. Should I be funny, irreverent, diplomatic, or analytical? Should I plaster slides with the last quarter’s worth of threat statistics, breach metrics, and headline news? Should I quip some anecdote and hope the attending CISO’s would have an epiphany that’ll fundamentally change the way they secure their organizations?

In the end I did none of that… instead I decided to pull apart the latest batch of security buzzwords – “Big Data” and “Machine Learning”.

If you attended RSA USA (or any major security vendor/booth conference) this year you can’t have missed the fact that everything from Antivirus through to USB memory sticks now come with a dab of big data, a sprinkling of machine learning, and a dollop of cloud for good measure. Thankfully I’m a cynic; or else I’d have been thrashing around on the ground in an epileptic fit from all the flashy marketing claims and trademarked nonsense phrases.

I guess I’m lucky to be in the position of having had several years of exposure to some of the greatest minds at Georgia Tech as they drummed in to me on a daily basis the “what and how” of machine learning in the context of solving many of today’s toughest security problems.

So, it was with that in mind that I thought “If I’m a CISO and everything I know about machine learning and big data came from carefully rehearsed vendor sound bites and glossy pamphlets, would I be able to tell the difference between Chanel #5 and cow manure?” The obvious answer would result in some very happy farmers.

What was the net result of this self-inflection and impending deadline? I crafted a short presentation for CISO’s… a 101 course on machine learning and big data… and it included ducks.

If you’re in the upper tiers of your organization and you’ve had sales folks pimping you their latest cloud-infused, big data-munching, machine learning, and world-hunger-solving security solution, please carry on reading as I attempt to explain the basics of the latest and greatest in buzzwords…

First of all – some context! In the world of breach detection and incident response there’s a common idiom: “If it walks like a duck, flies like a duck, and quacks like a duck… it must be a duck.”

Now I could easily spend another 5,000 words explaining why such an idiom doesn’t apply to modern security threats, targeted attacks and advanced persistent threats, but you’ll have to wait for a different blog post. Rather, for this 101 lesson, it’s important to understand the concept of “Feature Selection” – which in the case of this idiom includes: walking, flying and quacking.

If you’ve been tasked with dealing with a duck problem, ideally you’d be an aficionado on the feet, wings and sounds of ducks. You’d be able to apply this knowledge to each bird you have the time to focus your attention on and make a determination: Duck, or Not a Duck. As a security professional, you’d be versed in the various attributes of certain threats – and able to derive a conclusion as to the nature of the security problem.

The problem though is that at scale things break down.
What do you do when there’s too many to analyze, when time is too short, and when you can’t make out all the duck features from afar? This is typical of the big data problem (and your everyday network traffic). Storing the data is the easy part. Retrieving the data is mechanically complicated, but solvable.

Meanwhile, making decisions and taking actions upon the data is typically the most difficult part. With every doubling of data, your problem grows exponentially.

The traditional method of dealing with the situation has been to employ signature matching systems. In essence, we build rules based upon the features we’ve previously identified as significant and capable of bounding the problem (or duck selection). We then compare these rules against the sample animal and receive a binary answer – Duck, or Not a Duck.

Signature systems can be very efficient at classification. Just look at your average Intrusion Prevention System (IPS). A problem though lies in the scope of the features that had been defined.

If those features (or parameters) used for classification are too narrow (or too broad) then evasion is not only probable, but guaranteed. In essence, for a threat (or duck) to be classified, it must have been observed in the past or carefully predicted (although rare).

From an attacker’s perspective, knowledge of those features and triggering parameters makes it a trivial task to evade or to conduct false flag operations. Just think – hunters do this all the time with their floating duck decoys. Even fellow duck hunters have been known to mistakenly take pot-shots at them too.

Switching pace a little, let’s look at the network a little.
The big green blob is all the network traffic for an organization for a week. The red blog right-of-center is traffic associated with an active breach, and the lighter red blob with the dotted lines are just general malicious traffic observed within the network. In this two-dimensional view (if I hadn’t color-coded it previously) you’d have a near impossible task differentiating between them. As it is, the malicious traffic is mixed with both the “safe” and “breach” traffic.

The trick in differentiating between the network traffic types lies in increasing the dimensionality of the problem. What was a two-dimensional blob suddenly becomes much clearer when an appropriate view or perspective to the data is added. In the context of the above diagram, the addition of a z-axis and an extension in to the third-dimension allows the observer (i.e. analyst) to easily differentiate between the traffic types – for example, the axis could represent “country code of destination IP address”. In this context, the appropriate feature selection can greatly simplify the detection problem. Choosing appropriate features is important – nay, it’s critical!

This is where advances in machine learning over the last half-decade have really come to the fore in computer science and more recently in information security.

Without getting in to any of the math behind the various machine learning algorithms or techniques, the key concept you need to understand is “training”. It can mean many things to many a mathematician, but since we’re likely not one of those, what training means in our context is that we already have samples of what we’re going to be looking for, and samples of things we know we’re definitely not interested in. The better we define and populate these training sets, the more precise the machine learning system we’re employing will be in differentiating between them – and potentially classifying other contenders.

So, in this example we’ve taken a bunch of ducks and grouped them together. They become our “+ve class” – which basically means these are the things we’re interested in. But, equally important, is our “-ve class” – our collection of things we know not to be ducks. In many cases our -ve class may be more important than our +ve class because it contains all those false positives and “nearly” ducks – the things that may have caught us out once before.

One function of a good machine learning system is to automatically determine which attributes make the most sense in differentiating between your +ve and -ve classes.
While our poor old hunter (or analyst) was working with three features – walks, flies, and talks – the computer-based system may have reviewed all the attributes that were available and determined which ones are the most useful in differentiating between “ducks” and “not ducks”. In many cases the system will have weighted the various features (or attributes) to indicate which features are more deterministic of the classes.

For example, texture may be a good indicator of “not a duck” – since none of the +ve class were made from plastic or wood. Meanwhile features such as “wing length” may not be such a good criteria and will be weighted in a way to not have an influence on determining whether a duck is a duck or not – or may be dropped by the system entirely.

The number of features reviewed, assessed and weighted by the machine learning system will eventually be determined by the type of data being used and how “rich” it is. For example, in the network security realm we may be feeding the system with collated samples of firewall logs, DNS traffic samples, IP blacklists/whitelists, IPS alerts, etc. It’s important to note though that the “richer” the data set (i.e. the more possible features there could be), the more complex the problem is for the computer to solve and the longer it’ll take to train the system.

Now let’s switch back to that “big data” we originally talked about. In the duck realm we’re talking about all the birds within a national park (say). Meanwhile, in the network security realm, we may be talking about all the traffic observed in real-time across a corporate network and all the alerting instrumentation (e.g. firewalls, IPS, etc.)

I’m going to do some hand-waving here because it can get rather complex and there’s a lot of proprietary tweaks that can be undertaken here… but in one representation we can get our trained system to automatically group and cluster events on our network.
Using our original training data, or some other previously labeled datasets, it’s possible to automatically label the clusters output by a machine learning system.
For example, in the graphic above we see a number of unique clusters (or blobs if you insist). Through automatic labeling we know that the green blobs are types of ducks, the red blobs are various groupings of not ducks, and the gray blobs are clusters of previously unknown or unlabeled clusters – each one mathematically distinct from the other – based upon the features the system chose.
What the system can also do is assign a probability that the unknown clusters are associated with our +ve or -ve training sets. For example, in this particular graphical representation the proximity of the unlabeled clusters to labeled (and classified) clusters allows the system to assign a probability of whether the cluster is a duck or not a duck – even though the system had never seen these things before (i.e. “birds” the system hasn’t encountered before).
The next (and near final) stage is to manually label these new clusters. For example, we ask an ornithologist to look at each cluster of “ducks” and “not ducks” in turn and to label them… “rubber duckies”, “robot duckies”, and “Madagascar mallard ducks”.

Then, to improve our machine learning system further, we add these newly labeled clusters to our +ve and -ve training sets… and the system continues to learn and become more precise over time.

In addition, since we’ve now labeled these clusters, in the future we’re able to automatically flag new additions to these clusters and correctly label the duck (or threat).

And, if we’re a really smart CISO, we can use this clustering system (and labeled clusters) to automatically alert us to new threats or to initiate automatic network security actions – e.g. enable blocking of a new malicious URL, stop blocking a new cloud service delivering official updates to applications, etc.

The application of machine learning techniques to the toughest security problems facing business today has come along in leaps and bounds recently. However as with any buzz word that falls in to the hands of marketers and gets manipulated until it squeaks and glitters, or oozes onto every product in this year’s price list, senior technical staff need to take added care not to be bamboozled by well-practiced but meaningless word salad.

 A little understanding of the concepts behind big data and machine learning can not only cut through the latest batch of sales promises, but can also form the basis of constructing a new generation of meaningful breach detection and incident response solutions.
INSIGHTS | March 25, 2013

SQL Injection in the Wild

As attack vectors go, very few are as significant as obtaining the ability to insert bespoke code in to an application and have it automatically execute upon “inaccessible” backend systems. In the Web application arena, SQL Injection vulnerabilities are often the scariest threat that developers and system administrators come face to face with (albeit way too regularly).  In fact the OWASP Top-10 list of Web threats lists SQL Injection in first place.

More often than not, when security professionals discuss SQL Injection threats and attack vectors, they focus upon the Web application context. So it was with a bit of fun last week when I came across a photo of a slightly unorthodox SQL Injection attempt – that of someone attempting to subvert a traffic monitoring system by crafting a rather novel vehicle license plate.

My original tweet got retweeted a couple of thousand of times – which just goes to show how many security nerds there are out there in the twitterverse.

“in the wild” SQL Injection attempt was based upon the premise that video cameras are actively monitoring traffic on a road, reading license plates, and issuing driver warnings, tickets or fines as deemed appropriate by local law enforcement.
At some point the video captures of the passing vehicle’s license plate must be converted to text and stored – almost certainly in some kind of backend database. The hope of the hacker that devised this attack was that the process would be vulnerable to SQL Injection – and crafted a simple SQL statement that could potentially cause the backend database to drop (i.e. “delete”) the table containing all of the license plate information.
Whether or not this particular attempt worked, I have no idea (probably not if I have to guess an outcome); but it does help nicely to raise attention to this category of vulnerability.
As surveillance systems become more capable – digitally storing information, distilling meta-data from image captures, and sharing observation data between systems – it opens many new doors for mischievous and malicious attack.
The physical nature of these systems, coupled with the complexities of integration with legacy monitoring and reporting systems, often makes them open to attacks that would be classed as fairly simple in the world of Web application security.
A common failure of system developers is to assume that the physical constraints of the data acquisition process are less flexible than they are. For example, if you’re developing a traffic monitoring system it’s easy to assume that license plates are a fixed size and shape, and can only contain 10 alphanumeric characters. Meanwhile, the developers of the third-party image processing code had no such assumptions and will digitize any image. It reminds me a little of the story in which reuse of some object-oriented code a decade ago resulted in Kangaroos firing Stinger missiles during a military training simulation.
While the image above is amusing, I’ve encountered similar problems before when physical tracking systems integrate with digital backend processes – opening the door to embarrassing and fraudulent events. For example, in the past I’ve encountered similar SQL Injection vulnerabilities within systems such as:
  • Toll booths reading RFID tags mounted on vehicle windshields – where the tag readers would accept up to 2k of data from each tag (even though the system was only expecting a 16 digit number).
  • Credit card readers that would accept pre-paid cards with negative balances – which resulted in the backend database crediting the wrong accounts.
  • RFID inventory tracking systems – where a specially crafted RFID token could automatically remove all record of the previous hours’ worth of inventory logging information from the database allowing criminals to “disappear” with entire truckloads of goods.
  • Luggage barcode scanners within an airport – where specially crafted barcodes placed upon the baggage would be automatically conferred the status of “manually checked by security personnel” within the backend tracking database.
  • Shipping container RFID inventory trackers – where SQL statements could be embedded to adjust fields within the backend database to alter Custom and Excise tracking information.

 

Unlike the process of hunting for SQL Injection vulnerabilities within Internet accessible Web applications, you can’t just point an automated vulnerability scanner at the application and have at it. Assessing the security of complex physical monitoring systems is generally not a trivial task and requires some innovative approaches. Experience goes a long way.
— Gunter Ollmann, CTO IOActive Inc.