158,823 research outputs found
Advancing Data Privacy: A Novel K-Anonymity Algorithm with Dissimilarity Tree-Based Clustering and Minimal Information Loss
Anonymization serves as a crucial privacy protection technique employed across various technology domains, including cloud storage, machine learning, data mining and big data to safeguard sensitive information from unauthorized third-party access. As the significance and volume of data grow exponentially, comprehensive data protection against all threats is of utmost importance. The main objective of this paper is to provide a brief summary of techniques for data anonymization and differential privacy.A new k-anonymity method, which deviates from conventional k-anonymity approaches, is proposed by us to address privacy protection concerns. Our paper presents a new algorithm designed to achieve k-anonymity through more efficient clustering. The processing of data by most clustering algorithms requires substantial computation. However, by identifying initial centers that align with the data structure, a superior cluster arrangement can be obtained.Our study presents a Dissimilarity Tree-based strategy for selecting optimal starting centroids and generating more accurate clusters with reduced computing time and Normalised Certainty Penalty (NCP). This method also has the added benefit of reducing the Normalised Certainty Penalty (NCP). When compared to other methods, the graphical performance analysis shows that this one reduces the amount of overall information lost in the dataset being anonymized by around 20% on average. In addition, the method that we have designed is capable of properly handling both numerical and category characteristics
Generation of synthetic data by means of fuzzy c-Regression
Abstract-Problems related to data privacy are studied in the areas of privacy preserving data mining (PPDM) and statistical disclosure control (SDC). Their goal is to avoid the disclosure of sensitive or proprietary information to third parties. In this paper a new synthetic data generation method is proposed and the information loss and disclosure risk are measured. The method is based on fuzzy techniques. Informally, a fuzzy c-regression method is applied to the original data set and synthetic data is released with an appropriate information loss and disclosure risk depending on c. As other data protection methods do, our synthetic data generation procedure allows third parties to do some statistical computations with a limited risk of disclosure. The trade-off between data utility and data safety of our proposed method will be assessed
Knowing Your Population: Privacy-Sensitive Mining of Massive Data
Location and mobility patterns of individuals are important to environmental
planning, societal resilience, public health, and a host of commercial
applications. Mining telecommunication traffic and transactions data for such
purposes is controversial, in particular raising issues of privacy. However,
our hypothesis is that privacy-sensitive uses are possible and often beneficial
enough to warrant considerable research and development efforts. Our work
contends that peoples behavior can yield patterns of both significant
commercial, and research, value. For such purposes, methods and algorithms for
mining telecommunication data to extract commonly used routes and locations,
articulated through time-geographical constructs, are described in a case study
within the area of transportation planning and analysis. From the outset, these
were designed to balance the privacy of subscribers and the added value of
mobility patterns derived from their mobile communication traffic and
transactions data. Our work directly contrasts the current, commonly held
notion that value can only be added to services by directly monitoring the
behavior of individuals, such as in current attempts at location-based
services. We position our work within relevant legal frameworks for privacy and
data protection, and show that our methods comply with such requirements and
also follow best-practice
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
- …