5,449 research outputs found
Privacy Preserving Utility Mining: A Survey
In big data era, the collected data usually contains rich information and
hidden knowledge. Utility-oriented pattern mining and analytics have shown a
powerful ability to explore these ubiquitous data, which may be collected from
various fields and applications, such as market basket analysis, retail,
click-stream analysis, medical analysis, and bioinformatics. However, analysis
of these data with sensitive private information raises privacy concerns. To
achieve better trade-off between utility maximizing and privacy preserving,
Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent
years. In this paper, we provide a comprehensive overview of PPUM. We first
present the background of utility mining, privacy-preserving data mining and
PPUM, then introduce the related preliminaries and problem formulation of PPUM,
as well as some key evaluation criteria for PPUM. In particular, we present and
discuss the current state-of-the-art PPUM algorithms, as well as their
advantages and deficiencies in detail. Finally, we highlight and discuss some
technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
Time Distortion Anonymization for the Publication of Mobility Data with High Utility
An increasing amount of mobility data is being collected every day by
different means, such as mobile applications or crowd-sensing campaigns. This
data is sometimes published after the application of simple anonymization
techniques (e.g., putting an identifier instead of the users' names), which
might lead to severe threats to the privacy of the participating users.
Literature contains more sophisticated anonymization techniques, often based on
adding noise to the spatial data. However, these techniques either compromise
the privacy if the added noise is too little or the utility of the data if the
added noise is too strong. We investigate in this paper an alternative
solution, which builds on time distortion instead of spatial distortion.
Specifically, our contribution lies in (1) the introduction of the concept of
time distortion to anonymize mobility datasets (2) Promesse, a protection
mechanism implementing this concept (3) a practical study of Promesse compared
to two representative spatial distortion mechanisms, namely Wait For Me, which
enforces k-anonymity, and Geo-Indistinguishability, which enforces differential
privacy. We evaluate our mechanism practically using three real-life datasets.
Our results show that time distortion reduces the number of points of interest
that can be retrieved by an adversary to under 3 %, while the introduced
spatial error is almost null and the distortion introduced on the results of
range queries is kept under 13 % on average.Comment: in 14th IEEE International Conference on Trust, Security and Privacy
in Computing and Communications, Aug 2015, Helsinki, Finlan
Combinatorial Algorithms for String Sanitization
String data are often disseminated to support applications such as
location-based service provision or DNA sequence analysis. This dissemination,
however, may expose sensitive patterns that model confidential knowledge. In
this paper, we consider the problem of sanitizing a string by concealing the
occurrences of sensitive patterns, while maintaining data utility, in two
settings that are relevant to many common string processing tasks.
In the first setting, we aim to generate the minimal-length string that
preserves the order of appearance and frequency of all non-sensitive patterns.
Such a string allows accurately performing tasks based on the sequential nature
and pattern frequencies of the string. To construct such a string, we propose a
time-optimal algorithm, TFS-ALGO. We also propose another time-optimal
algorithm, PFS-ALGO, which preserves a partial order of appearance of
non-sensitive patterns but produces a much shorter string that can be analyzed
more efficiently. The strings produced by either of these algorithms are
constructed by concatenating non-sensitive parts of the input string. However,
it is possible to detect the sensitive patterns by ``reversing'' the
concatenation operations. In response, we propose a heuristic, MCSR-ALGO, which
replaces letters in the strings output by the algorithms with carefully
selected letters, so that sensitive patterns are not reinstated, implausible
patterns are not introduced, and occurrences of spurious patterns are
prevented. In the second setting, we aim to generate a string that is at
minimal edit distance from the original string, in addition to preserving the
order of appearance and frequency of all non-sensitive patterns. To construct
such a string, we propose an algorithm, ETFS-ALGO, based on solving specific
instances of approximate regular expression matching.Comment: Extended version of a paper accepted to ECML/PKDD 201
- …