11 research outputs found

    The Role of Quasi-identifiers in k-Anonymity Revisited

    Get PDF
    The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anonymization is usually rigorously validated by the authors, the definition of QI remains mostly informal, and different authors seem to have different interpretations of the concept of QI. The purpose of this paper is to provide a formal underpinning of QI and examine the correctness and incorrectness of various interpretations of QI in our formal framework. We observe that in cases where the concept has been used correctly, its application has been conservative; this note provides a formal understanding of the conservative nature in such cases.Comment: 17 pages. Submitted for publicatio

    Simple and effective method for selecting quasi-identifier

    Get PDF
    In this paper, a new method to select quasi-identifier (QI) to achieve k-anonymity for protecting privacy is introduced. For this purpose, two algorithms, Selective followed by Decompose algorithm, are proposed. The simulation results show that the proposed algorithm is better. Extensive experimental results on real world data sets confirm efficiency and accuracy of our algorithm

    RiAiR: A Framework for Sensitive RDF Protection

    Get PDF
    International audienceThe Semantic Web and the Linked Open Data (LOD) initiatives promote the integration and combination of RDF data on the Web. In some cases, data need to be analyzed and protected before publication in order to avoid the disclosure of sensitive information. However, existing RDF techniques do not ensure that sensitive information cannot be discovered since all RDF resources are linked in the Semantic Web and the combination of different datasets could produce or disclose unexpected sensitive information. In this context, we propose a framework, called RiAiR, which reduces the complexity of the RDF structure in order to decrease the interaction of the expert user for the classification of RDF data into identifiers, quasi-identifiers, etc. An intersection process suggests disclosure sources that can compromise the data. Moreover, by a generalization method, we decrease the connections among resources to comply with the main objectives of integration and combination of the Semantic Web. Results show a viability and high performance for a scenario where heterogeneous and linked datasets are present

    k-Anonymity in the Presence of External Databases

    Get PDF
    The concept of k-anonymity has received considerable attention due to the need of several organizations to release microdata without revealing the identity of individuals. Although all previous k-anonymity techniques assume the existence of a public database (P D) that can be used to breach privacy, none utilizes P D during the anonymization process. Specifically, existing generalization algorithms create anonymous tables using only the microdata table (MT) to be published, independently of the external knowledge available. This omission leads to high information loss. Motivated by this observation we first introduce the concept of k-join-anonymity (KJA), which permits more effective generalization to reduce the information loss. Briefly, KJA anonymizes a superset of MT, which includes selected records from P D. We propose two methodologies for adapting k-anonymity algorithms to their KJA counterparts. The first generalizes the combination of MT and P D, under the constraint that each group should contain at least one tuple of MT (otherwise, the group is useless and discarded). The second anonymizes MT, and then refines the resulting groups using P D. Finally, we evaluate the effectiveness of our contributions with an extensive experimental evaluation using real and synthetic datasets

    What the Surprising Failure of Data Anonymization Means for Law and Policy

    Get PDF
    Paul Ohm is an Associate Professor of Law at the University of Colorado Law School. He writes in the areas of information privacy, computer crime law, intellectual property, and criminal procedure. Through his scholarship and outreach, Professor Ohm is leading efforts to build new interdisciplinary bridges between law and computer science. Before becoming a law professor, Professor Ohm served as a federal prosecutor for the U.S. Department of Justice in the computer crimes unit. Before law school, he worked as a computer programmer and network systems administrator

    EPIC: a Methodology for Evaluating Privacy Violation Risk in Cybersecurity Systems

    Get PDF
    Cybersecurity Systems (CSSs) play a fundamental role in guaranteeing data confidentiality, integrity, and availability. However, while processing data, CSSs can intentionally or unintentionally expose personal information to people that can misuse them. For this reason, privacy implications of a CSS should be carefully evaluated. This is a challenging task mainly because modern CSSs have complex architectures and components. Moreover, data processed by CSSs can be exposed to different actors, both internal and external to the organization. This contribution presents a methodology, called EPIC, that is specifically designed to evaluate privacy violation risks in cybersecurity systems. Differently, from other general purpose guidelines, EPIC is an operational methodology aimed at guiding security and privacy experts with step-by-step instructions from modeling data exposure in the CSS to the systematical identification of privacy threats and evaluation of their associated privacy violation risk. This contribution also shows the application of the EPIC methodology to the use case of a large academic organization CSS protecting over 15, 000 hosts

    Data utility and privacy protection in data publishing

    Get PDF
    Data about individuals is being increasingly collected and disseminated for purposes such as business analysis and medical research. This has raised some privacy concerns. In response, a number of techniques have been proposed which attempt to transform data prior to its release so that sensitive information about the individuals contained within it is protected. A:-Anonymisation is one such technique that has attracted much recent attention from the database research community. A:-Anonymisation works by transforming data in such a way that each record is made identical to at least A: 1 other records with respect to those attributes that are likely to be used to identify individuals. This helps prevent sensitive information associated with individuals from being disclosed, as each individual is represented by at least A: records in the dataset. Ideally, a /c-anonymised dataset should maximise both data utility and privacy protection, i.e. it should allow intended data analytic tasks to be carried out without loss of accuracy while preventing sensitive information disclosure, but these two notions are conflicting and only a trade-off between them can be achieved in practice. The existing works, however, focus on how either utility or protection requirement may be satisfied, which often result in anonymised data with an unnecessarily and/or unacceptably low level of utility or protection. In this thesis, we study how to construct /-anonymous data that satisfies both data utility and privacy protection requirements. We propose new criteria to capture utility and protection requirements, and new algorithms that allow A:-anonymisations with required utility/protection trade-off or guarantees to be generated. Our extensive experiments using both benchmarking and synthetic datasets show that our methods are efficient, can produce A:-anonymised data with desired properties, and outperform the state of the art methods in retaining data utility and providing privacy protection

    The role of quasi-identifiers in k-anonymity revisited

    No full text
    The concept of k-anonymity, used in the recent literature (e.g., [10, 11, 7, 5, 1]) to formally evaluate the privacy preservation of published tables, was introduce
    corecore