Mismatch? CVSS, Vulnerability Management, and Organizational Risk

I’ll never forget a meeting I attended where a security engineer demanded IT remediate each of the 30,000 vulnerabilities he had discovered. I know that he wasn’t just dumping an unvetted pile of vulnerabilities on IT; he’d done his best to weed out false-positive results, other errors, and misses before presenting the findings. These were real issues, ranked using the Common Vulnerability Scoring System (CVSS). There can be no doubt that in that huge (and overwhelming) pile were some serious threats to the organization and its digital assets.

The reaction of the IT attendees did not surprise me, nor did the security engineer’s subsequent reaction. It didn’t go well. Presented with that much work, IT refused, describing their already fully loaded plans and referring the security engineer to the CIO. In other words, “Security, take your vulnerabilities and be gone. We have other work to do.”

I’ve seen this same dynamic play out over and over again. Faced with 72,000 unqualified static analysis findings, the application team addressed none of them. Given 130,000 issues across an organization’s entire (scannable) infrastructure, the operations team’s first reaction is to do nothing.

The foregoing, real-world numbers are overwhelming. As one senior architect told me, “It took me an average of 15 minutes to figure out whether each of the five findings was a real issue. None of them was. In order to work through this set of findings, the effort will take about six person-weeks. We don’t have a resource to dedicate to this for six weeks, especially at a high false-positive rate. Much of it will be wasted effort.”

At the same time, we intuitively know that somewhere in those piles of vulnerabilities are issues that will be exploited and whose exploitation will cause serious harm: we know there’s organizational risk in the pile. But how do we find the dangerous needle in the haystack of vulnerability findings?

Knee-jerk security responses don’t help. I cannot count the number of times a security person has flatly stated, “Just patch it.” As though patching is the simplest thing in the world.

Upgrading a security patch may be simple for a single application; it’s not quite so straightforward when faced with thousands of potential issues across thousands of software components. This is especially true as potential disruption from unexpected side-effects must be considered when introducing new software (patches) into complex systems whose failure might have disastrous consequences.

As far as I’ve been able to see, few organizations cope well with tens of thousands of issues, each demanding a fix. Plus, a continuing flow of new issues discovered each day adds to the work queue.

Ultimately, managing cybersecurity risk is a business decision just as managing any other organizational or operational risk is for an organization. When these issues are viewed from different, sometimes adversarial, technical silos, it is not surprising that a consensus understanding of organizational risk management priorities does not coalesce.

Up until recently, industry practice has been to prioritize issues based upon CVSS base score. However, research from 2014 indicates that using the CVSS base score may be no better than “choosing at random.”¹ Maybe that’s why even organizations with fairly mature security and operations functions continue to be compromised through unpatched vulnerabilities.

Perhaps we’re fixing the wrong issues. Are there attributes that will help find the most likely exploits?

If we rely solely on CVSS, especially, published CVSS base scores, then yes, we will prioritize many issues that will rarely, perhaps never, be exploited by real attackers. The 2014 ground-breaking academic analysis by Allodi and Massacci² found that CVSS base scores have been a poor predictor of exploitation. Their results have since been validated by some vendor-funded studies.³ CVSS has certainly proven useful for potential severity ratings, but using it as a predictor, or worse, as a risk calculation seems to be a mistake, despite the prevalence of the practice.

If not CVSS, then how can we identify the issues most likely to be exploited? Allodi and Massacci found that the addition of an exploit to an Exploit Kit dramatically increases the likelihood of use “in the wild,” that is, by real-world attackers. Their second strong predictor is Dark Web chatter about a vulnerability and its exploitation. When these two activities happen in tandem, one should fix the issue, and soon. This aligns well with the intuitive insights of today’s security practitioners who focused on threat intelligence and assessing their security posture through adversary emulation assessments such as red team exercises.

That should be easy, right? Unfortunately, processing Dark Web chatter proves non-trivial. Commercial products⁴ might not provide quite the right information, meaning that users must craft their own searches. Search capabilities in these products vary dramatically from full regular expressions to simple keyword searches. Buyer beware.

However, a recent announcement may signal the path forward. The Cyentia Institute and Kenna Security announced the release of their Exploit Prediction Scoring System (EPSS)⁵ and the research from which EPSS was built. Kenna Security is supplying the data from which the EPSS calculator works. EPSS employs further predictors than the two primary ones named by Allodi and Massacci; please see the EPSS research⁶ to learn more. EPSS may be vulnerability management’s better mousetrap.

EPSS includes the CVSS severity score. But it offers an entirely different dimension into the potential for active vulnerability misuse by real attackers. Don’t mistake CVSS for EPSS. They deliver very different facets of the vulnerability picture. Severity is our best guess as to how bad successful exploitation might be in a normalized, generalized case. CVSS lacks context, often glaringly missing. In comparison, EPSS attempts to tell us which vulnerabilities attackers will try, producing a percentage prediction of how likely exploitation will be at the time of calculation.

In the research (please see endnotes), exploitation of high-severity issues is actually much rarer than the misuse of low and medium issues. That may come as a surprise. One reason for the preference for low and medium issues might be the ease of crafting exploits. Plus, attackers hesitate using issues that require significant setup and preconditions. Instead, they routinely string together issues that in isolation aren’t all that impactful. But taken as a set of steps, the “kill chain”, several low and medium issues can lead to full compromise. A quick survey through a few of MITRE’s ATT&CK Threat Groups⁷ demonstrates how techniques are used to generate a kill chain.

When we rely upon CVSS severity as our priority, we fix the issues that in the most generalized case might cause the most damage, scheduling the lower severities for some later date. This is precisely the problem predictive analysis addresses: identify those issues in which attackers are interested, and prioritize those. It turns out that quite often, some low and medium severity issues are the ones to worry about.

Remove attacker leverage by patching some kill chain steps, and we raise the cost or even prevent chained attacks. But we can only do that if we know which issues, irrespective of their potential severity, attackers are considering. EPSS and predictive models, in general, may offer users a way to sift attacker-preferred issues from the chaff of overwhelming vulnerability queues.

I must warn readers that there are problems with EPSS. Today, all one can get is a single, point-in-time predictive score through a web browser interface. One-at-a-time scoring isn’t how vulnerability management must work in order to scale and provide just-in-time information. Unless a score is high enough to act upon when calculated, any score’s increase over time is the quantity to watch. Each vulnerability’s score needs to be monitored in order to identify issues that exceed the organization’s risk tolerance. Going to a website and checking tens of thousands of issues one at a time isn’t really workable.

If EPSS is going to be of use, there must be some automation for organizations to periodically check scores. The threat landscape is dynamic, so any solution must be equally dynamic. I hope that Cyentia and Kenna Security will provide a service or API through which organizations can monitor predictive score changes over time, and at scale.

EPSS is tightly coupled to the management of vulnerabilities. It would be a major error to apply EPSS, or any vulnerability misuse prediction method, to other aspects of organizational risk management. As always, every organization needs a robust and thorough understanding of its risk tolerances, dedicated skilled people to managing risk, and must adopt a rigorous and proven risk scoring mechanism, for instance, The Open Group standard: Factor Analysis of Information Risk (FAIR)⁸.

Importantly, EPSS will not supersede human risk analysis. EPSS and CVSS as well, are adjuncts to human analysis, not replacements. Well-resourced attackers appear to be using more so-called, zero-day vulnerabilities⁹, that is, vulnerabilities unknown before use and not yet fixed. To confront zero-days we must rely on our threat intelligence gathering and contextual risk analysis. Human threat modeling continues to be one of the best techniques for assessing potential danger from the unexpected appearance of a possible threat vector.

The Cyentia researchers indicated to me that Kenna Security owns the data used by EPSS. I attempted to contact someone at Kenna Security multiple times for this article, but Kenna Security has, unfortunately, not responded.

IOActive offers a full range of security consulting services, including vulnerability management, risk assessment, software security, and threat modeling.

Hopefully, this post helps your organization deal with your unmitigated vulnerability queue and better translate it into definable organizational and operational risks. Effective vulnerability management has the potential to free up resources that can be applied to other aspects of a robust cyber-risk program.

Cheers,
/brook s.e. Schoenfield
Master Security Architect
Director of Advisory Services

[1] Allodi, Luca & Massacci, Fabio. (2014). Comparing Vulnerability Severity and Exploits Using Case-Control Studies. ACM Transactions on Information and System Security. 17. 1-20. 10.1145/2630069. Thanks to Luis Servin (@lfservin) for the reference to this academic paper.

[2] http://seconomicsproject.eu/sites/default/files/seconomics/public/content-files/downloads/Comparing Vulnerabilities and Exploits using case-control studies.pdf

[3] NopSec, Inc’s 2016 and 2018 State of Vulnerability Risk Management Reports: https://www.nopsec.com/

[4] There is an open-source, public Dark Web search engine, DarkSearch.io. DarkSearch doesn’t offer full regular expressions, but it does offer several keyword and grouping enhancements.

[5] https://www.kennaresearch.com/tools/epss-calculator/

[6] Prioritization to Prediction, Cyentia Institute, and Kenna Security: https://www.kennasecurity.com/prioritization-to-prediction-report/images/Prioritization_to_Prediction.pdf

[7] https://mitre-attack.github.io/attack-navigator/enterprise/

[8] https://www.opengroup.org/forum/security-forum-0/risk-management

[9] Please see https://www.fireeye.com/blog/threat-research/2020/04/zero-day-exploitation-demonstrates-access-to-money-not-skill.html