6,194 research outputs found

    Improved k-Anonymize and l-Diverse Approach for Privacy Preserving Big Data Publishing Using MPSEC Dataset

    Get PDF
    Data exposure and privacy violations may happen when data is exchanged between organizations. Data anonymization gives promising results for limiting such dangers. In order to maintain privacy, different methods of k-anonymization and l-diversity have been widely used. But for larger datasets, the results are not very promising. The main problem with existing anonymization algorithms is high information loss and high running time. To overcome this problem, this paper proposes new models, namely Improved k-Anonymization (IKA) and Improved l-Diversity (ILD). IKA model takes large k-value using a symmetric as well as an asymmetric anonymizing algorithm. Then IKA is further categorized into Improved Symmetric k-Anonymization (ISKA) and Improved Asymmetric k-Anonymization (IAKA). After anonymizing data using IKA, ILD model is used to increase privacy. ILD will make the data more diverse and thereby increasing privacy. This paper presents the implementation of the proposed IKA and ILD model using real-time big candidate election dataset, which is acquired from the Madhya Pradesh State Election Commission, India (MPSEC) along with Apache Storm. This paper also compares the proposed model with existing algorithms, i.e. Fast clustering-based Anonymization for Data Streams (FADS), Fast Anonymization for Data Stream (FAST), Map Reduce Anonymization (MRA) and Scalable k-Anonymization (SKA). The experimental results show that the proposed models IKA and ILD have remarkable improvement of information loss and significantly enhanced the performance in terms of running time over the existing approaches along with maintaining the privacy-utility trade-off

    Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints

    Get PDF
    Publishing data about patients that contain both demographics and diagnosis codes is essential to perform large-scale, low-cost medical studies. However, preserving the privacy and utility of such data is challenging, because it requires: (i) guarding against identity disclosure (re-identification) attacks based on both demographics and diagnosis codes, (ii) ensuring that the anonymized data remain useful in intended analysis tasks, and (iii) minimizing the information loss, incurred by anonymization, to preserve the utility of general analysis tasks that are difficult to determine before data publishing. Existing anonymization approaches are not suitable for being used in this setting, because they cannot satisfy all three requirements. Therefore, in this work, we propose a new approach to deal with this problem. We enforce the requirement (i) by applying (k; k^m)-anonymity, a privacy principle that prevents re-identification from attackers who know the demographics of a patient and up to m of their diagnosis codes, where k and m are tunable parameters. To capture the requirement (ii), we propose the concept of utility constraint for both demographics and diagnosis codes. Utility constraints limit the amount of generalization and are specified by data owners (e.g., the healthcare institution that performs anonymization). We also capture requirement (iii), by employing well-established information loss measures for demographics and for diagnosiscodes. To realize our approach, we develop an algorithm that enforces (k; k^m)-anonymity on a dataset containing both demographics and diagnosis codes, in a way that satisfies the specified utility constraints and with minimal information loss, according to the measures. Our experiments with a large dataset containing more than 200; 000 electronic health recordsshow the effectiveness and efficiency of our algorithm

    Quantification of De-anonymization Risks in Social Networks

    Full text link
    The risks of publishing privacy-sensitive data have received considerable attention recently. Several de-anonymization attacks have been proposed to re-identify individuals even if data anonymization techniques were applied. However, there is no theoretical quantification for relating the data utility that is preserved by the anonymization techniques and the data vulnerability against de-anonymization attacks. In this paper, we theoretically analyze the de-anonymization attacks and provide conditions on the utility of the anonymized data (denoted by anonymized utility) to achieve successful de-anonymization. To the best of our knowledge, this is the first work on quantifying the relationships between anonymized utility and de-anonymization capability. Unlike previous work, our quantification analysis requires no assumptions about the graph model, thus providing a general theoretical guide for developing practical de-anonymization/anonymization techniques. Furthermore, we evaluate state-of-the-art de-anonymization attacks on a real-world Facebook dataset to show the limitations of previous work. By comparing these experimental results and the theoretically achievable de-anonymization capability derived in our analysis, we further demonstrate the ineffectiveness of previous de-anonymization attacks and the potential of more powerful de-anonymization attacks in the future.Comment: Published in International Conference on Information Systems Security and Privacy, 201

    Location Anonymization With Considering Errors and Existence Probability

    Get PDF
    Mobile devices that can sense their location using GPS or Wi-Fi have become extremely popular. However, many users hesitate to provide their accurate location information to unreliable third parties if it means that their identities or sensitive attribute values will be disclosed by doing so. Many approaches for anonymization, such as k-anonymity, have been proposed to tackle this issue. Existing studies for k-anonymity usually anonymize each user\u27s location so that the anonymized area contains k or more users. Existing studies, however, do not consider location errors and the probability that each user actually exists at the anonymized area. As a result, a specific user might be identified by untrusted third parties. We propose novel privacy and utility metrics that can treat the location and an efficient algorithm to anonymize the information associated with users\u27 locations. This is the first work that anonymizes location while considering location errors and the probability that each user is actually present at the anonymized area. By means of simulations, we have proven that our proposed method can reduce the risk of the user\u27s attributes being identified while maintaining the utility of the anonymized data
    • …
    corecore