4 research outputs found

    When the signal is in the noise: Exploiting Diffix's Sticky Noise

    Get PDF
    Anonymized data is highly valuable to both businesses and researchers. A large body of research has however shown the strong limits of the de-identification release-and-forget model, where data is anonymized and shared. This has led to the development of privacy-preserving query-based systems. Based on the idea of "sticky noise", Diffix has been recently proposed as a novel query-based mechanism satisfying alone the EU Article~29 Working Party's definition of anonymization. According to its authors, Diffix adds less noise to answers than solutions based on differential privacy while allowing for an unlimited number of queries. This paper presents a new class of noise-exploitation attacks, exploiting the noise added by the system to infer private information about individuals in the dataset. Our first differential attack uses samples extracted from Diffix in a likelihood ratio test to discriminate between two probability distributions. We show that using this attack against a synthetic best-case dataset allows us to infer private information with 89.4% accuracy using only 5 attributes. Our second cloning attack uses dummy conditions that conditionally strongly affect the output of the query depending on the value of the private attribute. Using this attack on four real-world datasets, we show that we can infer private attributes of at least 93% of the users in the dataset with accuracy between 93.3% and 97.1%, issuing a median of 304 queries per user. We show how to optimize this attack, targeting 55.4% of the users and achieving 91.7% accuracy, using a maximum of only 32 queries per user. Our attacks demonstrate that adding data-dependent noise, as done by Diffix, is not sufficient to prevent inference of private attributes. We furthermore argue that Diffix alone fails to satisfy Art. 29 WP's definition of anonymization. [...

    QuerySnout: automating the discovery of attribute inference attacks against query-based systems

    Get PDF
    Although query-based systems (QBS) have become one of the main solutions to share data anonymously, building QBSes that robustly protect the privacy of individuals contributing to the dataset is a hard problem. Theoretical solutions relying on differential privacy guarantees are difficult to implement correctly with reasonable accuracy, while ad-hoc solutions might contain unknown vulnerabilities. Evaluating the privacy provided by QBSes must thus be done by evaluating the accuracy of a wide range of privacy attacks. However, existing attacks against QBSes require time and expertise to develop, need to be manually tailored to the specific systems attacked, and are limited in scope. In this paper, we develop QuerySnout, the first method to automatically discover vulnerabilities in query-based systems. QuerySnout takes as input a target record and the QBS as a black box, analyzes its behavior on one or more datasets, and outputs a multiset of queries together with a rule to combine answers to them in order to reveal the sensitive attribute of the target record. QuerySnout uses evolutionary search techniques based on a novel mutation operator to find a multiset of queries susceptible to lead to an attack, and a machine learning classifier to infer the sensitive attribute from answers to the queries selected. We showcase the versatility of QuerySnout by applying it to two attack scenarios (assuming access to either the private dataset or to a different dataset from the same distribution), three real-world datasets, and a variety of protection mechanisms. We show the attacks found by QuerySnout to consistently equate or outperform, sometimes by a large margin, the best attacks from the literature. We finally show how QuerySnout can be extended to QBSes that require a budget, and apply QuerySnout to a simple QBS based on the Laplace mechanism. Taken together, our results show how powerful and accurate attacks against QBSes can already be found by an automated system, allowing for highly complex QBSes to be automatically tested "at the pressing of a button". We believe this line of research to be crucial to improve the robustness of systems providing privacy-preserving access to personal data in theory and in practice

    Extended Diffix

    No full text
    A longstanding open problem is that of how to get high quality statistics through direct queries to databases containing information about individuals without revealing information specific to those individuals. Diffix is a new framework for anonymous database query that adds noise based on the filter conditions in the query. A previous paper described Diffix for a simplified query semantics. This paper extends that description to include a wide variety of common features found in SQL. It describes attacks associated with various features, and the anonymization steps used to defend against those attacks. This paper describes the version of Diffix used for bounty program sponsored by Aircloak starting December 2017
    corecore