93 research outputs found
Recommended from our members
Anonymisation of geographical distance matrices via Lipschitz embedding
BACKGROUND: Anonymisation of spatially referenced data has received increasing attention in recent years. Whereas the research focus has been on the anonymisation of point locations, the disclosure risk arising from the publishing of inter-point distances and corresponding anonymisation methods have not been studied systematically.
METHODS: We propose a new anonymisation method for the release of geographical distances between records of a microdata file-for example patients in a medical database. We discuss a data release scheme in which microdata without coordinates and an additional distance matrix between the corresponding rows of the microdata set are released. In contrast to most other approaches this method preserves small distances better than larger distances. The distances are modified by a variant of Lipschitz embedding.
RESULTS: The effects of the embedding parameters on the risk of data disclosure are evaluated by linkage experiments using simulated data. The results indicate small disclosure risks for appropriate embedding parameters.
CONCLUSION: The proposed method is useful if published distance information might be misused for the re-identification of records. The method can be used for publishing scientific-use-files and as an additional tool for record-linkage studies
Spectral anonymization of data
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 87-96).Data anonymization is the process of conditioning a dataset such that no sensitive information can be learned about any specific individual, but valid scientific analysis can nevertheless be performed on it. It is not sufficient to simply remove identifying information because the remaining data may be enough to infer the individual source of the record (a reidentification disclosure) or to otherwise learn sensitive information about a person (a predictive disclosure). The only known way to prevent these disclosures is to remove additional information from the dataset. Dozens of anonymization methods have been proposed over the past few decades; most work by perturbing or suppressing variable values. None have been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. This dissertation makes the new observation that the anonymizing operations do not need to be made in the original basis of the dataset. Operating in a different, judiciously chosen basis can improve privacy protection, analytic utility, and computational efficiency. I use the term 'spectral anonymization' to refer to anonymizing in a spectral basis, such as the basis provided by the data's eigenvectors. Additionally, I propose new measures of reidentification and prediction risk that are more generally applicable and more informative than existing measures. I also propose a measure of analytic utility that assesses the preservation of the multivariate probability distribution. Finally, I propose the demanding reference standard of nonparticipation in the study to define adequate privacy protection. I give three examples of spectral anonymization in practice. The first example improves basic cell swapping from a weak algorithm to one competitive with state of-the-art methods merely by a change of basis.(cont) The second example demonstrates avoiding the curse of dimensionality in microaggregation. The third describes a powerful algorithm that reduces computational disclosure risk to the same level as that of nonparticipants and preserves at least 4th order interactions in the multivariate distribution. No previously reported algorithm has achieved this combination of results.by Thomas Anton Lasko.Ph.D
Security Infrastructure Technology for Integrated Utilization of Big Data
This open access book describes the technologies needed to construct a secure big data infrastructure that connects data owners, analytical institutions, and user institutions in a circle of trust. It begins by discussing the most relevant technical issues involved in creating safe and privacy-preserving big data distribution platforms, and especially focuses on cryptographic primitives and privacy-preserving techniques, which are essential prerequisites. The book also covers elliptic curve cryptosystems, which offer compact public key cryptosystems; and LWE-based cryptosystems, which are a type of post-quantum cryptosystem. Since big data distribution platforms require appropriate data handling, the book also describes a privacy-preserving data integration protocol and privacy-preserving classification protocol for secure computation. Furthermore, it introduces an anonymization technique and privacy risk evaluation technique. This book also describes the latest related findings in both the living safety and medical fields. In the living safety field, to prevent injuries occurring in everyday life, it is necessary to analyze injury data, find problems, and implement suitable measures. But most cases don’t include enough information for injury prevention because the necessary data is spread across multiple organizations, and data integration is difficult from a security standpoint. This book introduces a system for solving this problem by applying a method for integrating distributed data securely and introduces applications concerning childhood injury at home and school injury. In the medical field, privacy protection and patient consent management are crucial for all research. The book describes a medical test bed for the secure collection and analysis of electronic medical records distributed among various medical institutions. The system promotes big-data analysis of medical data with a cloud infrastructure and includes various security measures developed in our project to avoid privacy violations
Privacy protection in context aware systems.
Smartphones, loaded with users’ personal information, are a primary computing device for many. Advent of 4G networks, IPV6 and increased number of subscribers to these has triggered a host of application developers to develop softwares that are easy to install on the mobile devices. During the application download process, users accept the terms and conditions that permit revelation of private information. The free application markets are sustainable as the revenue model for most of these service providers is through profiling of users and pushing advertisements to the users. This creates a serious threat to users privacy and hence it is important that “privacy protection mechanisms” should be in place to protect the users’ privacy. Most of the existing solutions falsify or modify the information in the service request and starve the developers of their revenue. In this dissertation, we attempt to bridge the gap by proposing a novel integrated CLOPRO framework (Context Cloaking Privacy Protection) that achieves Identity privacy, Context privacy and Query privacy without depriving the service provider of sustainable revenue made from the CAPPA (Context Aware Privacy Preserving Advertising). Each service request has three parameters: identity, context and actual query. The CLOPRO framework reduces the risk of an adversary linking all of the three parameters. The main objective is to ensure that no single entity in the system has all the information about the user, the queries or the link between them, even though the user gets the desired service in a viable time frame. The proposed comprehensive framework for privacy protecting, does not require the user to use a modified OS or the service provider to modify the way an application developer designs and deploys the application and at the same time protecting the revenue model of the service provider. The system consists of two non-colluding servers, one to process the location coordinates (Location server) and the other to process the original query (Query server). This approach makes several inherent algorithmic and research contributions. First, we have proposed a formal definition of privacy and the attack. We identified and formalized that the privacy is protected if the transformation functions used are non-invertible. Second, we propose use of clustering of every component of the service request to provide anonymity to the user. We use a unique encrypted identity for every service request and a unique id for each cluster of users that ensures Identity privacy. We have designed a Split Clustering Anonymization Algorithms (SCAA) that consists of two algorithms Location Anonymization Algorithm (LAA) and Query Anonymization Algorithm (QAA). The application of LAA replaces the actual location for the users in the cluster with the centroid of the location coordinates of all users in that cluster to achieve Location privacy. The time of initiation of the query is not a part of the message string to the service provider although it is used for identifying the timed out requests. Thus, Context privacy is achieved. To ensure the Query privacy, the generic queries (created using QAA) are used that cover the set of possible queries, based on the feature variations between the queries. The proposed CLOPRO framework associates the ads/coupons relevant to the generic query and the location of the users and they are sent to the user along with the result without revealing the actual user, the initiation time of query or the location and the query, of the user to the service provider. Lastly, we introduce the use of caching in query processing to improve the response time in case of repetitive queries. The Query processing server caches the query result. We have used multiple approaches to prove that privacy is preserved in CLOPRO system. We have demonstrated using the properties of the transformation functions and also using graph theoretic approaches that the user’s Identity, Context and Query is protected against the curious but honest adversary attack, fake query and also replay attacks with the use of CLOPRO framework. The proposed system not only provides \u27k\u27 anonymity, but also satisfies the \u3c k; s \u3e and \u3c k; T \u3e anonymity properties required for privacy protection. The complexity of our proposed algorithm is O(n)
- …