3,048 research outputs found
Towards trajectory anonymization: a generalization-based approach
Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing
anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
Clustering with diversity
We consider the {\em clustering with diversity} problem: given a set of
colored points in a metric space, partition them into clusters such that each
cluster has at least points, all of which have distinct colors.
We give a 2-approximation to this problem for any when the objective
is to minimize the maximum radius of any cluster. We show that the
approximation ratio is optimal unless , by providing a matching
lower bound. Several extensions to our algorithm have also been developed for
handling outliers. This problem is mainly motivated by applications in
privacy-preserving data publication.Comment: Extended abstract accepted in ICALP 2010. Keywords: Approximation
algorithm, k-center, k-anonymity, l-diversit
k-Same-Siamese-GAN: k-Same Algorithm with Generative Adversarial Network for Facial Image De-identification with Hyperparameter Tuning and Mixed Precision Training
For a data holder, such as a hospital or a government entity, who has a
privately held collection of personal data, in which the revealing and/or
processing of the personal identifiable data is restricted and prohibited by
law. Then, "how can we ensure the data holder does conceal the identity of each
individual in the imagery of personal data while still preserving certain
useful aspects of the data after de-identification?" becomes a challenge issue.
In this work, we propose an approach towards high-resolution facial image
de-identification, called k-Same-Siamese-GAN, which leverages the
k-Same-Anonymity mechanism, the Generative Adversarial Network, and the
hyperparameter tuning methods. Moreover, to speed up model training and reduce
memory consumption, the mixed precision training technique is also applied to
make kSS-GAN provide guarantees regarding privacy protection on close-form
identities and be trained much more efficiently as well. Finally, to validate
its applicability, the proposed work has been applied to actual datasets - RafD
and CelebA for performance testing. Besides protecting privacy of
high-resolution facial images, the proposed system is also justified for its
ability in automating parameter tuning and breaking through the limitation of
the number of adjustable parameters
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
- …