Cookie Swap Party: Abusing First-Party Cookies forWeb Tracking

5 minute read

Publish Date:

Overview. Most of the javascript (JS) code on a website is provided by thirdparties for purposes such as analytics, ads, etc. However, this thirdparty JS code executes in the context of a firstparty on the website. This implies, that thirdparty JS has firstparty privileges when executing, enabling unanticipated activity which is detrimental to users privacy. In this paper, the authors quantify the prevalence and usage of one of these activities: accessing firstparty cookie by thirdparties JS.

Problem Background. A users’ cookies are accessed by first/third party in 2 ways: (1) HTTP header request (2) document.cookie API. The former method has been studied in detail by researchers [] and there are several ways to mitigate thirdparties from accessing cookies via HTTP requests e.g adblockers, privacy preserving browsers. However, since the latter is a browser API which thirdparty JS accesses while running in the context of a firstparty, it is non-trivial for browsers to mitigate its usage by thirdparties. The authors refer to cookies set by thirdparties using document.cookie API are referred to as external cookies.

Motivation. Talking about why external cookies are problematic, we can quickly see how a thirdparty can identify a returning user on a website using external cookies e.g. (1) a user visits a website and creates a cookie for that website (2) a thirdparty on that website adds an id in the users’ cookie for this website (3) when a user returns to this website with its’ cookie for this website it will contain the id and the thirdparty will be able to identify its the same user. Identifying a returning user is a tracking issue but not as bad as tracking a user across websites, which is also enabled by external cookies. If the id set by the thirdparty is based on the users fingerprint (a unique value across the web for this user) the thirdparty can identify the same user across all the websites it can see the user on. Furthermore since any party can access external cookies via document.cookie, thirdparty_1 can read external cookie set by thirdparty_2 and sync their data, gaining more knowledge about the user. This motivates analysis of how these external cookies are being created and who is reading them later on.

Setup. In this work, researchers from NCSU and UIC edit Google Chromes’ source code, to add functionality which enables browsers to flag usage of external cookies. They do so by leveraging a technique called Taint Analysis, which essentially marks or taints some resource (in this case a cookie value) on its creation and logs when it is accessed thereafter. This helps them create a provenance graph for external cookies. Using this updated Chrome they run analysis on the top 10,000 Alexa ranked websites, to measure prevalence and usage of external cookies.

Creating Provenance. There are two major parts for provenance graph for external cookies. First, marking cookies when they are set up thirdparty JS, using document.API. Second, flagging when these cookies are read thereafter. For the first part, whenever a JS uses sets or update a cookie, and the domain serving the JS does not match the firstparty domain, the browser marks this cookie as tainted by this JS domain. For the second part, whenever a network request is generated, the browser (based on some heuristics) searches the url for tainted cookie values and records the domain of the url as well. Combining these two, the authors are able to identify prevalence and usage of external cookies.

Prevalence Results. From the 10,000 sites in the crawl, the authors found external cookies on 97% of them. On these 97% websites, the authors record 13,323 non-sessional external cookies keyed by <JS_domain, cookie_name>. Next, using well established heuristics from previous research, they show that 31% of these external cookies had tracking ids in them. Having roughly 1/3rd of the top 10K websites prone to external cookies is an alarming result for reasons mentioned in the motivation section above.

Usage Results. It is clear that external cookies are being created at a high rate. This motivates measuring how these cookies are used (if at all) and by whom. Amongst the 4,212 cookies that had tracking id’s 3,256 were identified in a network request, enabling the creation of the provenance graph as discussed earlier. 2,354 of these external cookies are read by a different thirdparty then the ones generating them. This is a sign of high collusion rate between thirdparties, via external cookies. Even more problematic was the fact that 3 of these external cookies were based of users’ fingerprint.

Concluding Remarks. Summing up, the authors show a significant problem in the tracking ecosystem that has been overlooked by the community. The authors show how easy it is for thirdparties to not only track and fingerprint the user but also share this information with their partners, without worrying about any mitigation technique (as there doesn’t exist one).