Does your web application make use of local storage? If so, then like many developers you may well be making the assumption that when you read from local storage, it will only contain the data that you put there. As Steffens et al. show in this paper, that’s a dangerous assumption! The storage aspect of local storage makes possible a particularly nasty form of attack known as a persistent client-side cross-site scripting attack. Such an attack, once it has embedded itself in your browser one time (e.g. that one occasion you quickly had to jump on the coffee shop wifi), continues to work on all subsequent visits to the target site (e.g., once you’re back home on a trusted network).
In an analysis of the top 5000 Alexa domains, 21% of sites that make use of data originating from storage were found to contain vulnerabilities, of which at least 70% were directly exploitable using the models described in this paper.
Our analysis shows that more than 8% of the top 5,000 domains are potentially susceptible to a Persistent Client-Side XSS vulnerability. Moreover, considering only such domains which make use of tainted data in dangerous sinks, a staggering 21% (418/1,946) are vulnerable. Considering only the top 1,000 domains, we even found that 119 of them contained an unfiltered and unverified flow from cookes or Local Storage to an execution sink… we believe these results are lower bounds on the actual number of potentially vulnerable sites.
Persistent client-side XSS attacks
There are two basic requirements for a storage-based XSS attack. First, there must be a vulnerable path that permits an attacker to control the data written into local storage. Secondly, the page must use data from local storage in a manner that is exploitable (e.g., no sanitisation) at a sink.
Storing a malicious payload
There are two main routes an attacker can use to persist malicious payloads: an in-network attacker can hijack connections over HTTP, or the attacker can lure the victim to visit a website under the attacker’s control.
HTTP-based network attacks can be prevented by using HTTPS, and by sites setting the HTTTP Strict Transport Security (HSTS) header with the
includeSubDomains option. Unfortunately not all users and not all sites take these measures. Say a user connects to the “coffee shop wifi”…
If the attacker can instead lure a victim to a website the attacker controls (maybe via ads etc.), then for example the victim’s browser can be forced to load a vulnerable page on a third-party site containing a reflected Client-Side Cross-Site Scripting flaw. The payload sent to the vulnerable site triggers a flow which stores attacker-controlled content in local storage.
Once the data is in local storage, then the set of potential exploits is similar to any other scenario in which a web page contains attacker controlled content. Except that it this case, developers seem much less aware of the need to sanitise the input, implicitly trusting the data they pull from local storage. Several sites for example are using local storage to store code / json which they are
eval ing. That should probably raise a red flag regardless. Consider the following more benign looking example though:
Here the value retrieved from storage is neither checked nor encoded, so the attacker can break out of the
<a> tag and inject any script of their choosing.
Protecting a malicious payload
Do these vulnerabilities exist in the wild?
The authors crawled the Alexa top 5000 domains (up to 1000 sub-pages each, maximum depth 2, only public pages – i.e., nothing behind a login). The resulting dataset contained 12.5M documents. The crawling was done with a modified Chromium that reports invocations with tainted data (i.e., controlled by the crawling engine) to numerous sinks.
The following table shows the vulnerable flows found from source to sink, where the source can be an HTTP request, flow from a cookie, or flow from local storage.
From this set, a total of 906 domains have vulnerable cookie flows, and 654 domains have vulnerable Local Storage flows.
For these domains, the authors then tested to see whether stored values appears on a page (i.e., there is an exploitable flow from local storage).
More than half of the domains that had a flow from Local Storage to a sink could be exploited, indicating that little care is taken in ensuring the integrity and format of such data.
Now it’s just a matter of putting the two parts together: we’re looking for sites with vulnerable flows from an attacker controlled source to local storage, coupled with an exploitable flow from local storage. In total 65 domains had this deadly combination. “Since our crawlers neither log in nor try to cover all available code paths, the number of sites susceptible to such Client-Side XSS flaws is likely higher.”
A case study:
In our study, we found the single sign-on part of a major Chinese website network to be susceptible to both a persistent and a reﬂected Client-Side XSS ﬂaw. While abusing the reﬂected XSS could have been used to exﬁltrate the cookies of the user, these were protected with the HttpOnly ﬂag. Given the fact that the same origin also made insecure use of persisted code from Local Storage, however, rather than trying to steal the cookie, we built a proof of concept that extracted credentials from the login ﬁeld right before the submission of the credentials to the server.
When using local storage for unstructured data, always use context-aware sanitisation (i.e., apply the appropriate encoding to prevent escaping) before inserting the data in the DOM.
When using local storage for structured data, use
JSON.parse instead of
eval is more liberal in what it accepts, 27 of the domains in the study use data formats resembling JSON that can be parsed with eval, but not by JSON.parse. To which I say, fix your format!).
There is no general solution here, and we have to look at individual use cases. CloudFlare’s ‘Rocket Loader’ for example caches external scripts in local storage. A safer alternative would be to use service workers (see section VI.C in the paper for a sketch of the implementation).
Several vulnerabilities we discovered were caused by third-party code. We notiﬁed those parties which were responsible for at least three vulnerable domains. As of this writing, the four largest providers have acknowledged the issues and/or deployed ﬁxes for the ﬂawed code.