359 research outputs found

    When the signal is in the noise: Exploiting Diffix's Sticky Noise

    Get PDF
    Anonymized data is highly valuable to both businesses and researchers. A large body of research has however shown the strong limits of the de-identification release-and-forget model, where data is anonymized and shared. This has led to the development of privacy-preserving query-based systems. Based on the idea of "sticky noise", Diffix has been recently proposed as a novel query-based mechanism satisfying alone the EU Article~29 Working Party's definition of anonymization. According to its authors, Diffix adds less noise to answers than solutions based on differential privacy while allowing for an unlimited number of queries. This paper presents a new class of noise-exploitation attacks, exploiting the noise added by the system to infer private information about individuals in the dataset. Our first differential attack uses samples extracted from Diffix in a likelihood ratio test to discriminate between two probability distributions. We show that using this attack against a synthetic best-case dataset allows us to infer private information with 89.4% accuracy using only 5 attributes. Our second cloning attack uses dummy conditions that conditionally strongly affect the output of the query depending on the value of the private attribute. Using this attack on four real-world datasets, we show that we can infer private attributes of at least 93% of the users in the dataset with accuracy between 93.3% and 97.1%, issuing a median of 304 queries per user. We show how to optimize this attack, targeting 55.4% of the users and achieving 91.7% accuracy, using a maximum of only 32 queries per user. Our attacks demonstrate that adding data-dependent noise, as done by Diffix, is not sufficient to prevent inference of private attributes. We furthermore argue that Diffix alone fails to satisfy Art. 29 WP's definition of anonymization. [...

    SLIM : Scalable Linkage of Mobility Data

    Get PDF
    We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup

    Uncovering the spatial structure of mobility networks

    Get PDF
    The extraction of a clear and simple footprint of the structure of large, weighted and directed networks is a general problem that has many applications. An important example is given by origin-destination matrices which contain the complete information on commuting flows, but are difficult to analyze and compare. We propose here a versatile method which extracts a coarse-grained signature of mobility networks, under the form of a 2×22\times 2 matrix that separates the flows into four categories. We apply this method to origin-destination matrices extracted from mobile phone data recorded in thirty-one Spanish cities. We show that these cities essentially differ by their proportion of two types of flows: integrated (between residential and employment hotspots) and random flows, whose importance increases with city size. Finally the method allows to determine categories of networks, and in the mobility case to classify cities according to their commuting structure.Comment: 10 pages, 5 figures +Supplementary informatio

    The Use of Mobility Data for Responding to the COVID-19 Pandemic

    Get PDF
    As the COVID-19 pandemic continues to upend the way people move, work, and gather, governments, businesses, and public health researchers have looked increasingly at mobility data to support pandemic response. This data, assets that describe human location and movement, generally has been collected for purposes directly related to a company's business model, including optimizing the delivery of consumer services, supply chain management or targeting advertisements. However, these call detail records, smartphone-mobility data, vehicle-derived GPS, and other mobility data assets can also be used to study patterns of movement. These patterns of movement have, in turn, been used by organizations to forecast disease spread and inform decisions on how to best manage activity in certain locations.Researchers at The GovLab and Cuebiq, supported by the Open Data Institute, identified 51 notable projects from around the globe launched by public sector and research organizations with companies that use mobility data for these purposes. It curated five projects among this listing that highlight the specific opportunities (and risks) presented by using this asset. Though few of these highlighted projects have provided public outputs that make assessing project success difficult, organizations interviewed considered mobility data to be a useful asset that enabled better public health surveillance, supported existing decision-making processes, or otherwise allowed groups to achieve their research goals.The report below summarizes some of the major points identified in those case studies. While acknowledging that location data can be a highly sensitive data type that can facilitate surveillance or expose data subjects if used carelessly, it finds mobility data can support research and inform decisions when applied toward narrowly defined research questions through frameworks that acknowledge and proactively mitigate risk. These frameworks can vary based on the individual circumstances facing data users, suppliers, and subjects. However, there are a few conditions that can enable users and suppliers to promote publicly beneficial and responsible data use and overcome the serious obstacles facing them.For data users (governments and research institutions), functional access to real-time and contextually relevant data can support research goals, even though a lack of data science competencies and both short and long-term funding sources represent major obstacles for this goal. Data suppliers (largely companies), meanwhile, need governance structures and mechanisms that facilitate responsible re-use, including data re-use agreements that define who, what, where, and when, and under what conditions data can be shared. A lack of regulatory clarity and the absence of universal governance and privacy standards have impeded effective and responsible dissemination of mobility for research and humanitarian purposes. Finally, for both data users and suppliers, we note that collaborative research networks that allow organizations to seek out and provide data can serve as enablers of project success by facilitating exchange of methods and resources, and closing the gap between research and practice.Based on these findings, we recommend the development of clear governance and privacy frameworks, increased capacity building around data use within the public sector, and more regular convenings of ecosystem stakeholders (including the public and data subjects) to broaden collaborative networks. We also propose solutions towards making the responsible use of mobility data more sustainable for longterm impact beyond the current pandemic. A failure to develop regulatory and governance frameworks that can responsibly manage mobility data could lead to a regression to the ad hoc and uncoordinated approaches that previously defined mobility data applications. It could also lead to disparate standards about organizations' responsibilities to the public
    corecore