647 research outputs found
A Decision Tree Approach for Assessing and Mitigating Background and Identity Disclosure Risks
The Facebook/Cambridge Analytica data scandal shows a type of privacy threat where an adversary attacks on a massive number of people without prior knowledge about their background information. Existing studies typically assume that the adversary knew the background information of the target individuals. This study examines the disclosure risk issue in privacy breaches without such an assumption. We define the background disclosure risk and re-identification risk based on the notion of prior and conditional probabilities respectively, and integrate the two risk measures into a composite measure using the Minimum Description Length principle. We then develop a decision-tree pruning algorithm to find an appropriate group size considering the tradeoff between disclosure risk and data utility. Furthermore, we propose a novel tiered generalization method for anonymizing data at the group level. An experimental study has been conducted to demonstrate the effectiveness of our approach
Utility-driven assessment of anonymized data via clustering
In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This
approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law
students. Several anonymized clustering scenarios were compared against the original cluster solution.
The clustering techniques were explored as data utility models in the context of data anonymization,
using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized
data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a
relevant metric in social sciences research). For a matter of self-containment, we present an overview
of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed
several clustering validity indices to understand to what extent the data structure is preserved, or not,
after data anonymization. The results suggest that for low dimensionality/cardinality datasets the
anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that
relevant field-of-study estimates obtained from anonymized data are biased.info:eu-repo/semantics/publishedVersio
A vision for global privacy bridges: Technical and legal measures for international data markets
From the early days of the information economy, personal data has been its most valuable asset. Despite data protection laws and an acknowledged right to privacy, trading personal information has become a business equated with "trading oil". Most of this business is done without the knowledge and active informed consent of the people. But as data breaches and abuses are made public through the media, consumers react. They become irritated about companies' data handling practices, lose trust, exercise political pressure and start to protect their privacy with the help of technical tools. As a result, companies' Internet business models that are based on personal data are unsettled. An open conflict is arising between business demands for data and a desire for privacy. As of 2015 no true answer is in sight of how to resolve this conflict. Technologists, economists and regulators are struggling to develop technical solutions and policies that meet businesses' demand for more data while still maintaining privacy. Yet, most of the proposed solutions fail to account for market complexity and provide no pathway to technological and legal implementation. They lack a bigger vision for data use and privacy. To break this vicious cycle, we propose and test such a vision of a personal information market with privacy. We accumulate technical and legal measures that have been proposed by technical and legal scholars over the past two decades. And out of this existing knowledge, we compose something new: a four-space market model for personal data
Privacy-Preserving Design of Data Processing Systems in the Public Transport Context
The public transport network of a region inhabited by more than 4 million people is run by a complex interplay of public and private actors. Large amounts of data are generated by travellers, buying and using various forms of tickets and passes. Analysing the data is of paramount importance for the governance and sustainability of the system. This manuscript reports the early results of the privacy analysis which is being undertaken as part of the analysis of the clearing process in the Emilia-Romagna region, in Italy, which will compute the compensations for tickets bought from one operator and used with another. In the manuscript it is shown by means of examples that the clearing data may be used to violate various privacy aspects regarding users, as well as (technically equivalent) trade secrets regarding operators. The ensuing discussion has a twofold goal. First, it shows that after researching possible existing solutions, both by reviewing the literature on general privacy-preserving techniques, and by analysing similar scenarios that are being discussed in various cities across the world, the former are found exhibiting structural effectiveness deficiencies, while the latter are found of limited applicability, typically involving less demanding requirements. Second, it traces a research path towards a more effective approach to privacy-preserving data management in the specific context of public transport, both by refinement of current sanitization techniques and by application of the privacy by design approach.
Available at: https://aisel.aisnet.org/pajais/vol7/iss4/4
PRUDEnce: A system for assessing privacy risk vs utility in data sharing ecosystems
Data describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people’s whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRUDEnce (Privacy Risk versus Utility in Data sharing Ecosystems), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data
Recommended from our members
Addressing the Failure of Anonymization: Guidance from the European Union’s General Data Protection Regulation
It is common practice for companies to “anonymize” the consumer data that they collect. In fact, U.S. data protection laws and Federal Trade Commission guidelines encourage the practice of anonymization by exempting anonymized data from the privacy and data security requirements they impose. Anonymization involves removing personally identifiable information (“PII”) from a dataset so that, in theory, the data cannot be traced back to its data subjects. In practice, however, anonymization fails to irrevocably protect consumer privacy due to the potential for deanonymization—the linking of anonymized data to auxiliary information to re-identify data subjects. Because U.S. data protection laws provide safe harbors for anonymized data, re-identified data subjects receive no statutory privacy protections at all—a fact that is particularly troublesome given consumers’ dependence on technology and today’s climate of ubiquitous data collection.
By adopting an all-or-nothing approach to anonymization, the United States has created no means of incentivizing the practice of anonymization while still providing data subjects statutory protections. This Note argues that the United States should look to the risk-based approach taken by the European Union under the General Data Protection Regulation. Their data protection laws utilize multiple tiers of anonymization, which vary in their potential for deanonymization. Under this approach, pseudonymized data—i.e., certain data that has had PII removed but can still be linked to auxiliary information to re-identify data subjects—falls within the scope of the governing law, but receives relaxed requirements designed to incentivize pseudonymization and thereby reduce the risk of data subject identification. This approach both strikes a balance between data privacy and data utility, and affords data subjects the benefit of anonymity in addition to statutory protections ranging from choice to transparency
Catch, Clean, and Release: A Survey of Obstacles and Opportunities for Network Trace Sanitization
Network researchers benefit tremendously from access to traces of production networks, and several repositories of such network traces exist. By their very nature, these traces capture sensitive business and personal activity. Furthermore, network traces contain significant operational information about the target network, such as its structure, identity of the network provider, or addresses of important servers. To protect private or proprietary information, researchers must “sanitize” a trace before sharing it. \par In this chapter, we survey the growing body of research that addresses the risks, methods, and evaluation of network trace sanitization. Research on the risks of network trace sanitization attempts to extract information from published network traces, while research on sanitization methods investigates approaches that may protect against such attacks. Although researchers have recently proposed both quantitative and qualitative methods to evaluate the effectiveness of sanitization methods, such work has several shortcomings, some of which we highlight in a discussion of open problems. Sanitizing a network trace, however challenging, remains an important method for advancing network–based research
- …