2,615 research outputs found
DATA CLUSTERING AND MICRO-PERTURBATION FOR PRIVACY-PRESERVING DATA SHARING AND ANALYSIS
Clustering-based data masking approaches are widely used for privacy-preserving data sharing and data mining. Existing approaches, however, cannot cope with the situation where confidential attributes are categorical. For numeric data, these approaches are also unable to preserve important statistical properties such as variance and covariance of the data. We propose a new approach that handles these problems effectively. The proposed approach adopts a minimum spanning tree technique for clustering data and a micro-perturbation method for masking data. Our approach is novel in that it (i) incorporates an entropy-based measure, which represents the disclosure risk of the categorical confidential attribute, into the traditional distance measure used for clustering in an innovative way; and (ii) introduces the notion of cluster-level microperturbation (as opposed to conventional micro-aggregation) for masking data, to preserve the statistical properties of the data. We provide both analytical and empirical justification for the proposed methodology
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation
With the wide deployment of public cloud computing infrastructures, using
clouds to host data query services has become an appealing solution for the
advantages on scalability and cost-saving. However, some data might be
sensitive that the data owner does not want to move to the cloud unless the
data confidentiality and query privacy are guaranteed. On the other hand, a
secured query service should still provide efficient query processing and
significantly reduce the in-house workload to fully realize the benefits of
cloud computing. We propose the RASP data perturbation method to provide secure
and efficient range query and kNN query services for protected data in the
cloud. The RASP data perturbation method combines order preserving encryption,
dimensionality expansion, random noise injection, and random projection, to
provide strong resilience to attacks on the perturbed data and queries. It also
preserves multidimensional ranges, which allows existing indexing techniques to
be applied to speedup range query processing. The kNN-R algorithm is designed
to work with the RASP range query algorithm to process the kNN queries. We have
carefully analyzed the attacks on data and queries under a precisely defined
threat model and realistic security assumptions. Extensive experiments have
been conducted to show the advantages of this approach on efficiency and
security.Comment: 18 pages, to appear in IEEE TKDE, accepted in December 201
Secret charing vs. encryption-based techniques for privacy preserving data mining
Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach
Enabling Multi-level Trust in Privacy Preserving Data Mining
Privacy Preserving Data Mining (PPDM) addresses the problem of developing
accurate models about aggregated data without access to precise information in
individual data record. A widely studied \emph{perturbation-based PPDM}
approach introduces random perturbation to individual values to preserve
privacy before data is published. Previous solutions of this approach are
limited in their tacit assumption of single-level trust on data miners.
In this work, we relax this assumption and expand the scope of
perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the
more trusted a data miner is, the less perturbed copy of the data it can
access. Under this setting, a malicious data miner may have access to
differently perturbed copies of the same data through various means, and may
combine these diverse copies to jointly infer additional information about the
original data that the data owner does not intend to release. Preventing such
\emph{diversity attacks} is the key challenge of providing MLT-PPDM services.
We address this challenge by properly correlating perturbation across copies at
different trust levels. We prove that our solution is robust against diversity
attacks with respect to our privacy goal. That is, for data miners who have
access to an arbitrary collection of the perturbed copies, our solution prevent
them from jointly reconstructing the original data more accurately than the
best effort using any individual copy in the collection. Our solution allows a
data owner to generate perturbed copies of its data for arbitrary trust levels
on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on
Knowledge and Data Engineerin
- …