647 research outputs found

    A Decision Tree Approach for Assessing and Mitigating Background and Identity Disclosure Risks

    Get PDF
    The Facebook/Cambridge Analytica data scandal shows a type of privacy threat where an adversary attacks on a massive number of people without prior knowledge about their background information. Existing studies typically assume that the adversary knew the background information of the target individuals. This study examines the disclosure risk issue in privacy breaches without such an assumption. We define the background disclosure risk and re-identification risk based on the notion of prior and conditional probabilities respectively, and integrate the two risk measures into a composite measure using the Minimum Description Length principle. We then develop a decision-tree pruning algorithm to find an appropriate group size considering the tradeoff between disclosure risk and data utility. Furthermore, we propose a novel tiered generalization method for anonymizing data at the group level. An experimental study has been conducted to demonstrate the effectiveness of our approach

    Utility-driven assessment of anonymized data via clustering

    Get PDF
    In this study, clustering is conceived as an auxiliary tool to identify groups of special interest. This approach was applied to a real dataset concerning an entire Portuguese cohort of higher education Law students. Several anonymized clustering scenarios were compared against the original cluster solution. The clustering techniques were explored as data utility models in the context of data anonymization, using k-anonymity and (ε, δ)-differential as privacy models. The purpose was to assess anonymized data utility by standard metrics, by the characteristics of the groups obtained, and the relative risk (a relevant metric in social sciences research). For a matter of self-containment, we present an overview of anonymization and clustering methods. We used a partitional clustering algorithm and analyzed several clustering validity indices to understand to what extent the data structure is preserved, or not, after data anonymization. The results suggest that for low dimensionality/cardinality datasets the anonymization procedure easily jeopardizes the clustering endeavor. In addition, there is evidence that relevant field-of-study estimates obtained from anonymized data are biased.info:eu-repo/semantics/publishedVersio

    A vision for global privacy bridges: Technical and legal measures for international data markets

    Get PDF
    From the early days of the information economy, personal data has been its most valuable asset. Despite data protection laws and an acknowledged right to privacy, trading personal information has become a business equated with "trading oil". Most of this business is done without the knowledge and active informed consent of the people. But as data breaches and abuses are made public through the media, consumers react. They become irritated about companies' data handling practices, lose trust, exercise political pressure and start to protect their privacy with the help of technical tools. As a result, companies' Internet business models that are based on personal data are unsettled. An open conflict is arising between business demands for data and a desire for privacy. As of 2015 no true answer is in sight of how to resolve this conflict. Technologists, economists and regulators are struggling to develop technical solutions and policies that meet businesses' demand for more data while still maintaining privacy. Yet, most of the proposed solutions fail to account for market complexity and provide no pathway to technological and legal implementation. They lack a bigger vision for data use and privacy. To break this vicious cycle, we propose and test such a vision of a personal information market with privacy. We accumulate technical and legal measures that have been proposed by technical and legal scholars over the past two decades. And out of this existing knowledge, we compose something new: a four-space market model for personal data

    Privacy-Preserving Design of Data Processing Systems in the Public Transport Context

    Get PDF
    The public transport network of a region inhabited by more than 4 million people is run by a complex interplay of public and private actors. Large amounts of data are generated by travellers, buying and using various forms of tickets and passes. Analysing the data is of paramount importance for the governance and sustainability of the system. This manuscript reports the early results of the privacy analysis which is being undertaken as part of the analysis of the clearing process in the Emilia-Romagna region, in Italy, which will compute the compensations for tickets bought from one operator and used with another. In the manuscript it is shown by means of examples that the clearing data may be used to violate various privacy aspects regarding users, as well as (technically equivalent) trade secrets regarding operators. The ensuing discussion has a twofold goal. First, it shows that after researching possible existing solutions, both by reviewing the literature on general privacy-preserving techniques, and by analysing similar scenarios that are being discussed in various cities across the world, the former are found exhibiting structural effectiveness deficiencies, while the latter are found of limited applicability, typically involving less demanding requirements. Second, it traces a research path towards a more effective approach to privacy-preserving data management in the specific context of public transport, both by refinement of current sanitization techniques and by application of the privacy by design approach. Available at: https://aisel.aisnet.org/pajais/vol7/iss4/4

    PRUDEnce: A system for assessing privacy risk vs utility in data sharing ecosystems

    Get PDF
    Data describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people’s whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRUDEnce (Privacy Risk versus Utility in Data sharing Ecosystems), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data

    Catch, Clean, and Release: A Survey of Obstacles and Opportunities for Network Trace Sanitization

    Get PDF
    Network researchers benefit tremendously from access to traces of production networks, and several repositories of such network traces exist. By their very nature, these traces capture sensitive business and personal activity. Furthermore, network traces contain significant operational information about the target network, such as its structure, identity of the network provider, or addresses of important servers. To protect private or proprietary information, researchers must “sanitize” a trace before sharing it. \par In this chapter, we survey the growing body of research that addresses the risks, methods, and evaluation of network trace sanitization. Research on the risks of network trace sanitization attempts to extract information from published network traces, while research on sanitization methods investigates approaches that may protect against such attacks. Although researchers have recently proposed both quantitative and qualitative methods to evaluate the effectiveness of sanitization methods, such work has several shortcomings, some of which we highlight in a discussion of open problems. Sanitizing a network trace, however challenging, remains an important method for advancing network–based research
    • …
    corecore