4,660 research outputs found
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
Time Distortion Anonymization for the Publication of Mobility Data with High Utility
An increasing amount of mobility data is being collected every day by
different means, such as mobile applications or crowd-sensing campaigns. This
data is sometimes published after the application of simple anonymization
techniques (e.g., putting an identifier instead of the users' names), which
might lead to severe threats to the privacy of the participating users.
Literature contains more sophisticated anonymization techniques, often based on
adding noise to the spatial data. However, these techniques either compromise
the privacy if the added noise is too little or the utility of the data if the
added noise is too strong. We investigate in this paper an alternative
solution, which builds on time distortion instead of spatial distortion.
Specifically, our contribution lies in (1) the introduction of the concept of
time distortion to anonymize mobility datasets (2) Promesse, a protection
mechanism implementing this concept (3) a practical study of Promesse compared
to two representative spatial distortion mechanisms, namely Wait For Me, which
enforces k-anonymity, and Geo-Indistinguishability, which enforces differential
privacy. We evaluate our mechanism practically using three real-life datasets.
Our results show that time distortion reduces the number of points of interest
that can be retrieved by an adversary to under 3 %, while the introduced
spatial error is almost null and the distortion introduced on the results of
range queries is kept under 13 % on average.Comment: in 14th IEEE International Conference on Trust, Security and Privacy
in Computing and Communications, Aug 2015, Helsinki, Finlan
Enabling Multi-level Trust in Privacy Preserving Data Mining
Privacy Preserving Data Mining (PPDM) addresses the problem of developing
accurate models about aggregated data without access to precise information in
individual data record. A widely studied \emph{perturbation-based PPDM}
approach introduces random perturbation to individual values to preserve
privacy before data is published. Previous solutions of this approach are
limited in their tacit assumption of single-level trust on data miners.
In this work, we relax this assumption and expand the scope of
perturbation-based PPDM to Multi-Level Trust (MLT-PPDM). In our setting, the
more trusted a data miner is, the less perturbed copy of the data it can
access. Under this setting, a malicious data miner may have access to
differently perturbed copies of the same data through various means, and may
combine these diverse copies to jointly infer additional information about the
original data that the data owner does not intend to release. Preventing such
\emph{diversity attacks} is the key challenge of providing MLT-PPDM services.
We address this challenge by properly correlating perturbation across copies at
different trust levels. We prove that our solution is robust against diversity
attacks with respect to our privacy goal. That is, for data miners who have
access to an arbitrary collection of the perturbed copies, our solution prevent
them from jointly reconstructing the original data more accurately than the
best effort using any individual copy in the collection. Our solution allows a
data owner to generate perturbed copies of its data for arbitrary trust levels
on-demand. This feature offers data owners maximum flexibility.Comment: 20 pages, 5 figures. Accepted for publication in IEEE Transactions on
Knowledge and Data Engineerin
Privacy-Preserving Trajectory Data Publishing via Differential Privacy
Over the past decade, the collection of data by individuals, businesses and government agencies has increased tremendously. Due to the widespread of mobile computing and the advances in location-acquisition techniques, an immense amount of data concerning the mobility of moving objects have been generated. The movement data of an object (e.g. individual) might include specific information about the locations it visited, the time those locations were visited, or both. While it is beneficial to share data for the purpose of mining and analysis, data sharing might risk the privacy of the individuals involved in the data. Privacy-Preserving Data Publishing (PPDP) provides techniques that utilize several privacy models for the purpose of publishing useful information while preserving data privacy.
The objective of this thesis is to answer the following question: How can a data owner publish trajectory data while simultaneously safeguarding the privacy of the data and maintaining its usefulness? We propose an algorithm for anonymizing and publishing trajectory data that ensures the output is differentially private while maintaining high utility and scalability. Our solution comprises a twofold approach. First, we generalize trajectories by generalizing and then partitioning the timestamps at each location in a differentially private manner. Next, we add noise to the real count of the generalized trajectories according to the given privacy budget to enforce differential privacy. As a result, our approach achieves an overall epsilon-differential privacy on the output trajectory data. We perform experimental evaluation on real-life data, and demonstrate that our proposed approach can effectively answer count and range queries, as well as mining frequent sequential patterns. We also show that our algorithm is efficient w.r.t. privacy budget and number of partitions, and also scalable with increasing data size
- …