158,823 research outputs found

    Advancing Data Privacy: A Novel K-Anonymity Algorithm with Dissimilarity Tree-Based Clustering and Minimal Information Loss

    Get PDF
    Anonymization serves as a crucial privacy protection technique employed across various technology domains, including cloud storage, machine learning, data mining and big data to safeguard sensitive information from unauthorized third-party access. As the significance and volume of data grow exponentially, comprehensive data protection against all threats is of utmost importance. The main objective of this paper is to provide a brief summary of techniques for data anonymization and differential privacy.A new k-anonymity method, which deviates from conventional k-anonymity approaches, is proposed by us to address privacy protection concerns. Our paper presents a new algorithm designed to achieve k-anonymity through more efficient clustering. The processing of data by most clustering algorithms requires substantial computation. However, by identifying initial centers that align with the data structure, a superior cluster arrangement can be obtained.Our study presents a Dissimilarity Tree-based strategy for selecting optimal starting centroids and generating more accurate clusters with reduced computing time and Normalised Certainty Penalty (NCP). This method also has the added benefit of reducing the Normalised Certainty Penalty (NCP). When compared to other methods, the graphical performance analysis shows that this one reduces the amount of overall information lost in the dataset being anonymized by around 20% on average. In addition, the method that we have designed is capable of properly handling both numerical and category characteristics

    Generation of synthetic data by means of fuzzy c-Regression

    Get PDF
    Abstract-Problems related to data privacy are studied in the areas of privacy preserving data mining (PPDM) and statistical disclosure control (SDC). Their goal is to avoid the disclosure of sensitive or proprietary information to third parties. In this paper a new synthetic data generation method is proposed and the information loss and disclosure risk are measured. The method is based on fuzzy techniques. Informally, a fuzzy c-regression method is applied to the original data set and synthetic data is released with an appropriate information loss and disclosure risk depending on c. As other data protection methods do, our synthetic data generation procedure allows third parties to do some statistical computations with a limited risk of disclosure. The trade-off between data utility and data safety of our proposed method will be assessed

    Knowing Your Population: Privacy-Sensitive Mining of Massive Data

    Full text link
    Location and mobility patterns of individuals are important to environmental planning, societal resilience, public health, and a host of commercial applications. Mining telecommunication traffic and transactions data for such purposes is controversial, in particular raising issues of privacy. However, our hypothesis is that privacy-sensitive uses are possible and often beneficial enough to warrant considerable research and development efforts. Our work contends that peoples behavior can yield patterns of both significant commercial, and research, value. For such purposes, methods and algorithms for mining telecommunication data to extract commonly used routes and locations, articulated through time-geographical constructs, are described in a case study within the area of transportation planning and analysis. From the outset, these were designed to balance the privacy of subscribers and the added value of mobility patterns derived from their mobile communication traffic and transactions data. Our work directly contrasts the current, commonly held notion that value can only be added to services by directly monitoring the behavior of individuals, such as in current attempts at location-based services. We position our work within relevant legal frameworks for privacy and data protection, and show that our methods comply with such requirements and also follow best-practice

    Privacy Preserving Utility Mining: A Survey

    Full text link
    In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
    • …
    corecore