16 research outputs found
Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System
Recently proposed systems aim at achieving privacy using locality-sensitive
hashing. We show how these approaches fail by presenting attacks against two
such systems: Google's FLoC proposal for privacy-preserving targeted
advertising and the MinHash Hierarchy, a system for processing mobile users'
traffic behavior in a privacy-preserving way. Our attacks refute the pre-image
resistance, anonymity, and privacy guarantees claimed for these systems.
In the case of FLoC, we show how to deanonymize users using Sybil attacks and
to reconstruct 10% or more of the browsing history for 30% of its users using
Generative Adversarial Networks. We achieve this only analyzing the hashes used
by FLoC. For MinHash, we precisely identify the movement of a subset of
individuals and, on average, we can limit users' movement to just 10% of the
possible geographic area, again using just the hashes. In addition, we refute
their differential privacy claims.Comment: 14 pages, 9 figures submitted to PETS 202
S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees
Privacy-preserving learning of gradient boosting decision trees (GBDT) has
the potential for strong utility-privacy tradeoffs for tabular data, such as
census data or medical meta data: classical GBDT learners can extract
non-linear patterns from small sized datasets. The state-of-the-art notion for
provable privacy-properties is differential privacy, which requires that the
impact of single data points is limited and deniable. We introduce a novel
differentially private GBDT learner and utilize four main techniques to improve
the utility-privacy tradeoff. (1) We use an improved noise scaling approach
with tighter accounting of privacy leakage of a decision tree leaf compared to
prior work, resulting in noise that in expectation scales with , for
data points. (2) We integrate individual R\'enyi filters to our method to
learn from data points that have been underutilized during an iterative
training process, which -- potentially of independent interest -- results in a
natural yet effective insight to learning streams of non-i.i.d. data. (3) We
incorporate the concept of random decision tree splits to concentrate privacy
budget on learning leaves. (4) We deploy subsampling for privacy amplification.
Our evaluation shows for the Abalone dataset ( training data points) a
-score of for , which the closest prior work only
achieved for . On the Adult dataset ( training data
points) we achieve test error of for which the
closest prior work only achieved for . For the Abalone dataset
for we achieve -score of which is very close to
the -score of for the nonprivate version of GBDT. For the Adult
dataset for we achieve test error which is very
close to the test error of the nonprivate version of GBDT.Comment: The first two authors equally contributed to this wor
Efficient and Extensible Policy Mining for Relationship-Based Access Control
Relationship-based access control (ReBAC) is a flexible and expressive
framework that allows policies to be expressed in terms of chains of
relationship between entities as well as attributes of entities. ReBAC policy
mining algorithms have a potential to significantly reduce the cost of
migration from legacy access control systems to ReBAC, by partially automating
the development of a ReBAC policy. Existing ReBAC policy mining algorithms
support a policy language with a limited set of operators; this limits their
applicability. This paper presents a ReBAC policy mining algorithm designed to
be both (1) easily extensible (to support additional policy language features)
and (2) scalable. The algorithm is based on Bui et al.'s evolutionary algorithm
for ReBAC policy mining algorithm. First, we simplify their algorithm, in order
to make it easier to extend and provide a methodology that extends it to handle
new policy language features. However, extending the policy language increases
the search space of candidate policies explored by the evolutionary algorithm,
thus causes longer running time and/or worse results. To address the problem,
we enhance the algorithm with a feature selection phase. The enhancement
utilizes a neural network to identify useful features. We use the result of
feature selection to reduce the evolutionary algorithm's search space. The new
algorithm is easy to extend and, as shown by our experiments, is more efficient
and produces better policies
Transitive primal infon logic: The propositional case, Microsoft Research
Abstract Primal (propositional) logic PL is the {∧, →} fragment of intuitionistic logic, and primal (propositional) infon logic PIL is a conservative extension of PL with the quotation construct p said. Logic PIL was introduced by Gurevich and Neeman in 2009 in connection with the DKAL project. The derivation problem for PIL (and therefore for PL) is solvable in linear time, and yet PIL allows one to express many common access control scenarios. The most obvious limitations on the expressivity of logics PL and PIL are the failures of the transitivity rules pref x → z respectively where pref ranges over quotation prefixes p said q said . . .. Here we investigate the extension T of PL with an axiom x → x and the inference rule (trans0) as well as the extension qT of PIL with an axiom pref x → x and the inference rule (trans). • [Subformula property] T has the subformula property: if Γ y then there is a derivation of y from Γ comprising only subformulas of Γ ∪ {y}. qT has a similar locality property. • [Complexity] The derivation problems for T and qT are solvable in quadratic time. • [Soundness and completeness] We define Kripke models for qT (resp. T) and show that the semantics is sound and complete. • [Small models] T has the one-element-model property: if Γ y then there is a one-element counterexample. Similarly small (though not one-element) counterexamples exist for qT
Automating Cookie Consent and GDPR Violation Detection
The European Union’s General Data Protection Regulation (GDPR) requires websites to inform users about personal data collection and request consent for cookies. Yet the majority of websites do not give users any choices, and others attempt to deceive them into accepting all cookies. We document the severity of this situation through an analysis of potential GDPR violations in cookie banners in almost 30k websites. We identify six novel violation types, such as incorrect category assignments and misleading expiration times, and we find at least one potential violation in a surprising 94.7% of the analyzed websites.
We address this issue by giving users the power to protect their privacy. We develop a browser extension, called CookieBlock, that uses machine learning to enforce GDPR cookie consent at the client. It automatically categorizes cookies by usage purpose using only the information provided in the cookie itself. At a mean validation accuracy of 84.4%, our model attains a prediction quality competitive with expert knowledge in the field. Additionally, our approach differs from prior work by not relying on the cooperation of websites themselves. We empirically evaluate CookieBlock on a set of 100 randomly sampled websites, on which it filters roughly 90% of the privacy-invasive cookies without significantly impairing website functionality
Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System
Recently proposed systems aim at achieving privacy using locality-sensitive hashing. We show how these approaches fail by presenting attacks against two such systems: Google’s FLoC proposal for privacy-preserving targeted advertising and the MinHash Hierarchy, a system for processing location trajectories in a privacy-preserving way. Our attacks refute the pre-image resistance, anonymity, and privacy guarantees claimed for these systems.
In the case of FLoC, we show how to deanonymize users using Sybil attacks and to reconstruct 10% or more of the browsing history for 30% of its users using Generative Adversarial Networks. We achieve this only analyzing the hashes used by FLoC. For MinHash, we precisely identify the location trajectory of a subset of individuals and, on average, we can limit users’ trajectory to just 10% of the possible geographic area, again using just the hashes. In addition, we refute their differential privacy claims.ISSN:2299-098