TrackerSift: Untangling Mixed Tracking and Functional Web Resources

2 minute read

Publish Date:

Tracking Lists. EasyList and EasyPrivacy (EL, EP) are two of the most well known open-source resources which drive privacy preserving web extensions such as Ublock, AdBlock and Ghostry. These lists comprise of url patterns that belong to online advertisers/trackers. Privacy preserving extensions leverage these lists to block network responses matching these rules. However, these lists have their limitations: (1) slow maintenance due to handful of contributors. (2) inability to block mixed resources. The focus of this paper is analyzing and mitigating the latter issue.

What are Mixed Resources? To circumvent blocking lists, trackers either change their network location (domain, url etc.) or mix functional content with tracking content e.g serve tracking and functional content from the same network endpoint or CDN. If a tracker uses the latter technique, it is called a mixed resource. Mixed resources are problematic for blocking lists, as blocking functional content breaks websites and allowing them increases privacy breaches. This paper provides a framework TrackerSift, which untangles functional and tracking content served by mixed resources.

TrackerSift. At a high-level, this new framework suggests a hierarchical analysis of the request urls at multiple granularities. At each hierarchical level you decide to block the url or analyze it at a finer granularity. These levels include domain, hostname, script and method in increasing granularities.

Before analyzing the urls, TrackerSift uses EL and EP as ground truth, to label each incoming url as functional or tracking. Following this, at level 1, TrackerSift extracts domains from each url. If a domain has significantly more tracking/functional urls than functional/tracking, it is categorized as tracking/functional domain. Otherwise, it is categorized as mixed domain and sent to level 2. At level 2, the deciding factor is hostname, at level 3 it is the script generating the request and finally at level 4 it is the method inside the script responsible for the request. Blocking lists can leverage the untangle mixed resources and create rules, based on the level at which they were untangled.

Analysis. To analyze the characteristics of mixed resources and performance of TrackerSift, the authors ran a crawl on 100k sites to gather network requests and stack traces. By processing the requests with TrackerSift they untangled ~25,000 requests.

Applicability. Untangling at level 1, 2 and 3 is ideal for blocking lists as they possess functionality to swiftly create rules with domain/hostname/script based options. Method based untangling is slightly more complex. It requires generating surrogate scripts with tracking methods removed. TrackerSift can provide both, new rules and surrogate scripts based on their analysis.

Concluding Remarks. This paper focuses on a crucial gap in the blocking lists research area. It increases privacy while retaining functionality, making it more applicable. It also encourages improving the state of blocking lists via more complex and granular analysis.