RESEARCH | October 18, 2016

Let’s Terminate XML Schema Vulnerabilities

XML eXternal Entity (XXE) attacks are a common threat to applications using XML schemas, either actively or unknowingly. That is because we continue to use XML schemas that can be abused in multiple ways. Programming languages and libraries use XML schemas to define the expected contents of XML documents, SAML authentications or SOAP messages. XML schemas were intended to constrain document definitions, yet they have introduced multiple attack avenues.

XML parsers should be prepared to manage two types of problematic XML documents: malformed files and invalid files. Malformed files do not follow the World Wide Web Consortium (W3C) specification. This results in unexpected consequences for any software that parses the file. Invalid files abuse XML schemas and exploit actions that are built into programming languages and libraries. This may unwillingly put any software relying on these technologies at risk.

I analyzed possible attacks affecting XML and discovered new attacks in the previous categories. Among them:

  • Schema version disclosure: Parsers may disclose type and version information when retrieving remote schemas.
  • Unrestrictive schemas: Major companies still rely on schemas that cannot provide the restrictions required for their XML documents. DTD schemas continue to be common, even though they are often unable to accomplish their goal.
  • Improper data validation: As companies move away from DTD schemas and attempt to implement better restrictions, they may provide incomplete protection for their systems. XML schemas provide highly granular constraints, and if they are not properly defined, the approved document may affect the underlying security of applications.

Some of the proposed examples provided by the W3C include vulnerable methods of defining XML documents. To illlustrate with a basic example, the W3C states that an application can use a DTD to verify that XML data is valid, and provides the following example:

<?xml version=”1.0″?>
<!DOCTYPE note [
  <!ELEMENT note (to,from,heading,body)>
  <!ELEMENT to (#PCDATA)>
  <!ELEMENT from (#PCDATA)>
  <!ELEMENT heading (#PCDATA)>
  <!ELEMENT body (#PCDATA)>
]>
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don’t forget me this weekend</body>
</note>

This example document contains an embedded DTD that constrains the contents to the previously defined elements. However, those elements do not provide a constraint on how large they are or what type of data they contain. Moreover, since the DTD declaration can simply be changed or removed from the previous document, you can even have a valid, well-formed document without an XML schema.

For more details and greater depth on this topic, view the full white paper, Assessing and Exploiting XML Schema’s Vulnerabilities.

As always, we would love to hear from other security types who might have a differing opinion. All of our positions are subject to change through exposure to compelling arguments and/or data.