30 research outputs found

    COOKIEGRAPH: Measuring and Countering First-Party Tracking Cookies

    Full text link
    Recent privacy protections by browser vendors aim to limit the abuse of third-party cookies for cross-site tracking. While these countermeasures against third-party cookies are widely welcome, there are concerns that they will result in advertisers and trackers abusing first-party cookies instead. We provide the first empirical evidence of how first-party cookies are abused by advertisers and trackers by conducting a differential measurement study on 10K websites with third-party cookies allowed and blocked. We find that advertisers and trackers implement cross-site tracking despite third-party cookie blocking by storing identifiers, based on probabilistic and deterministic attributes, in first-party cookies. As opposed to third-party cookies, outright first-party cookie blocking is not practical because it would result in major breakage of legitimate website functionality. We propose CookieGraph, a machine learning approach that can accurately and robustly detect first-party tracking cookies. CookieGraph detects first-party tracking cookies with 91.06% accuracy, outperforming the state-of-the-art CookieBlock approach by 10.28%. We show that CookieGraph is fully robust against cookie name manipulation while CookieBlock's accuracy drops by 15.68%. We also show that CookieGraph does not cause any major breakage while CookieBlock causes major breakage on 8% of the websites with SSO logins. Our deployment of CookieGraph shows that first-party tracking cookies are used on 93.43% of the 10K websites. We also find that the most prevalent first-party tracking cookies are set by major advertising entities such as Google as well as many specialized entities such as Criteo

    Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

    Full text link
    As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts (â©ľ\geqslant94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques

    Automated discovery of privacy violations on the web

    No full text
    Online tracking is increasingly invasive and ubiquitous. Tracking protection provided by browsers is often ineffective, while solutions based on voluntary cooperation, such as Do Not Track, haven't had meaningful adoption. Knowledgeable users may turn to anti-tracking tools, but even these more advanced solutions fail to fully protect against the techniques we study. In this dissertation, we introduce OpenWPM, a platform we developed for flexible and modular web measurement. We've used OpenWPM to run large-scale studies leading to the discovery of numerous privacy violations across the web and in emails. These discoveries have curtailed the adoption of tracking techniques, and have informed policy debates and browser privacy decisions. In particular, we present novel detection methods and results for persistent tracking techniques, including: device fingerprinting, cookie syncing, and cookie respawning. Our findings include sophisticated fingerprinting techniques never before measured in the wild. We've found that nearly every new API is misused by trackers for fingerprinting. The misuse is often invisible to users and publishers alike, and in many cases was not anticipated by API designers. We take a critical look at how the API design process can be changed to prevent such misuse in the future. We also explore the industry of trackers which use PII-derived identifiers to track users across devices, and even into the offline world. To measure these techniques, we develop a novel bait technique, which allows us to spoof the presence of PII on a large number of sites. We show how trackers exfiltrate the spoofed PII through the abuse of browser features. We find that PII collection is not limited to the web--the act of viewing an email also leaks PII to trackers. Overall, about 30% of emails leak the recipient's email address to one or more third parties. Finally, we study the ability of a passive eavesdropper to leverage tracking cookies for mass surveillance. If two web pages embed the same tracker, then the adversary can link visits to those pages from the same user even if the user's IP address varies. We find that the adversary can reconstruct 62-73% of a typical user's browsing history

    No boundaries: data exfiltration by third parties embedded on web pages

    No full text
    We investigate data exfiltration by third-party scripts directly embedded on web pages. Specifically, we study three attacks: misuse of browsers’ internal login managers, social data exfiltration, and whole-DOM exfiltration. Although the possibility of these attacks was well known, we provide the first empirical evidence based on measurements of 300,000 distinct web pages from 50,000 sites. We extend OpenWPM’s instrumentation to detect and precisely attribute these attacks to specific third-party scripts. Our analysis reveals invasive practices such as inserting invisible login forms to trigger autofilling of the saved user credentials, and reading and exfiltrating social network data when the user logs in via Facebook login. Further, we uncovered password, credit card, and health data leaks to third parties due to wholesale collection of the DOM. We discuss the lessons learned from the responses to the initial disclosure of our findings and fixes that were deployed by the websites, browser vendors, third-party libraries and privacy protection tools
    corecore