Search CORE

30 research outputs found

COOKIEGRAPH: Measuring and Countering First-Party Tracking Cookies

Author: Englehardt Steven
Iqbal Umar
Munir Shaoor
Shafiq Zubair
Siby Sandra
Troncoso Carmela
Publication venue
Publication date: 02/09/2022
Field of study

Recent privacy protections by browser vendors aim to limit the abuse of third-party cookies for cross-site tracking. While these countermeasures against third-party cookies are widely welcome, there are concerns that they will result in advertisers and trackers abusing first-party cookies instead. We provide the first empirical evidence of how first-party cookies are abused by advertisers and trackers by conducting a differential measurement study on 10K websites with third-party cookies allowed and blocked. We find that advertisers and trackers implement cross-site tracking despite third-party cookie blocking by storing identifiers, based on probabilistic and deterministic attributes, in first-party cookies. As opposed to third-party cookies, outright first-party cookie blocking is not practical because it would result in major breakage of legitimate website functionality. We propose CookieGraph, a machine learning approach that can accurately and robustly detect first-party tracking cookies. CookieGraph detects first-party tracking cookies with 91.06% accuracy, outperforming the state-of-the-art CookieBlock approach by 10.28%. We show that CookieGraph is fully robust against cookie name manipulation while CookieBlock's accuracy drops by 15.68%. We also show that CookieGraph does not cause any major breakage while CookieBlock causes major breakage on 8% of the websites with SSO logins. Our deployment of CookieGraph shows that first-party tracking cookies are used on 93.43% of the 10K websites. We also find that the most prevalent first-party tracking cookies are set by major advertising entities such as Google as well as many specialized entities such as Criteo

arXiv.org e-Print Archive

Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection

Author: Bird Sarah
Englehardt Steven
Lopatka Martin
Mishra Vikas
Rudametkin Walter
Willoughby Rob
Zeber David
Publication venue
Publication date: 09/03/2020
Field of study

As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts (

\geqslant

94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Antipsychotic drugs in community-based sheltered-care homes

Author: Carscallen
Clark
Clark
Crane
Eisenberg
Englehardt
Epstein
Gardos
Gross
Hirsch
Hogarty
Hollister
Klein
Klein
Leff
Lennard
Overall
Pasamanick
Prien
Prien
Prien
Rappaport
Scheff
Segal
Segal
Segal
Steven P. Segal
Susan Chandler
Tuteur
Uri Aviram
Weiner
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Automated discovery of privacy violations on the web

Author: Englehardt Steven
Publication venue: Princeton, NJ : Princeton University
Publication date: 01/01/2018
Field of study

Online tracking is increasingly invasive and ubiquitous. Tracking protection provided by browsers is often ineffective, while solutions based on voluntary cooperation, such as Do Not Track, haven't had meaningful adoption. Knowledgeable users may turn to anti-tracking tools, but even these more advanced solutions fail to fully protect against the techniques we study. In this dissertation, we introduce OpenWPM, a platform we developed for flexible and modular web measurement. We've used OpenWPM to run large-scale studies leading to the discovery of numerous privacy violations across the web and in emails. These discoveries have curtailed the adoption of tracking techniques, and have informed policy debates and browser privacy decisions. In particular, we present novel detection methods and results for persistent tracking techniques, including: device fingerprinting, cookie syncing, and cookie respawning. Our findings include sophisticated fingerprinting techniques never before measured in the wild. We've found that nearly every new API is misused by trackers for fingerprinting. The misuse is often invisible to users and publishers alike, and in many cases was not anticipated by API designers. We take a critical look at how the API design process can be changed to prevent such misuse in the future. We also explore the industry of trackers which use PII-derived identifiers to track users across devices, and even into the offline world. To measure these techniques, we develop a novel bait technique, which allows us to spoof the presence of PII on a large number of sites. We show how trackers exfiltrate the spoofed PII through the abuse of browser features. We find that PII collection is not limited to the web--the act of viewing an email also leaks PII to trackers. Overall, about 30% of emails leak the recipient's email address to one or more third parties. Finally, we study the ability of a passive eavesdropper to leverage tracking cookies for mass surveillance. If two web pages embed the same tracker, then the adversary can link visits to those pages from the same user even if the user's IP address varies. We find that the adversary can reconstruct 62-73% of a typical user's browsing history

Dataspace

Recommended from our members

Online Tracking: A 1-million-site Measurement and Analysis

Author: Englehardt Steven
Narayanan Arvind
Publication venue
Publication date: 01/10/2016
Field of study

We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites ("cookie syncing"). Our findings include multiple sophisticated fingerprinting techniques never before measured in the wild. This measurement is made possible by our open-source web privacy measurement tool, OpenWPM, which uses an automated version of a full-fledged consumer browser. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and comprehensive browser instrumentation. We demonstrate our platform's strength in enabling researchers to rapidly detect, quantify, and characterize emerging online tracking behaviors

Princeton University Open Access Repository

Crossref

Recommended from our members

Battery Status Not Included: Assessing Privacy in Web Standards

Author: Englehardt Steven
Narayanan Arvind
Olejnik Lukasz
Publication venue
Publication date: 01/01/2017
Field of study

The standardization process is core to the development of the open web. Until 2013, the process rarely included privacy review and had no formal privacy requirements. But today the importance of privacy engineering has become apparent to standards bodies such as the W3C as well as to browser vendors. Standards groups now have guidelines for privacy assessments, and are including privacy reviews in many new specifications. However, the standards community does not yet have much practical experience in assessing privacy. In this paper we systematically analyze the W3C Battery Status API to help inform future privacy assessments. We begin by reviewing its evolution — the initial specification, which only cursorily addressed privacy, the discovery of surprising privacy vulnerabilities as well as actual misuse in the wild, followed by the removal of the API from major browser engines, an unprecedented move. Next, we analyze web measurement data from late 2016 and confirm that the majority of scripts used the API for fingerprinting. Finally, we draw lessons from this affair and make recommendations for improving privacy engineering of web standards

Princeton University Open Access Repository

No boundaries: data exfiltration by third parties embedded on web pages

Author: Acar Gunes
Englehardt Steven
Narayanan Arvind
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/10/2020
Field of study

We investigate data exfiltration by third-party scripts directly embedded on web pages. Specifically, we study three attacks: misuse of browsers’ internal login managers, social data exfiltration, and whole-DOM exfiltration. Although the possibility of these attacks was well known, we provide the first empirical evidence based on measurements of 300,000 distinct web pages from 50,000 sites. We extend OpenWPM’s instrumentation to detect and precisely attribute these attacks to specific third-party scripts. Our analysis reveals invasive practices such as inserting invisible login forms to trigger autofilling of the saved user credentials, and reading and exfiltrating social network data when the user logs in via Facebook login. Further, we uncovered password, credit card, and health data leaks to third parties due to wholesale collection of the DOM. We discuss the lessons learned from the responses to the initial disclosure of our findings and fixes that were deployed by the websites, browser vendors, third-party libraries and privacy protection tools

Directory of Open Access Journals