24,575 research outputs found
A Clustering-Anonymity Approach for Trajectory Data Publishing Considering both Distance and Direction
Trajectory data contains rich spatio-temporal information of moving objects. Directly publishing it for mining and analysis will result in severe privacy disclosure problems. Most existing clustering-anonymity methods cluster trajectories according to either distance- or direction-based similarities, leading to a high information loss. To bridge this gap, in this paper, we present a clustering-anonymity approach considering both these two types of similarities. As trajectories may not be synchronized, we first design a trajectory synchronization algorithm to synchronize them. Then, two similarity metrics between trajectories are quantitatively defined, followed by a comprehensive one. Furthermore, a clustering-anonymity algorithm for trajectory data publishing with privacy-preserving is proposed. It groups trajectories into clusters according to the comprehensive similarity metric. These clusters are finally anonymized. Experimental results show that our algorithm is effective in preserving privacy with low information loss
Privacy via the Johnson-Lindenstrauss Transform
Suppose that party A collects private information about its users, where each
user's data is represented as a bit vector. Suppose that party B has a
proprietary data mining algorithm that requires estimating the distance between
users, such as clustering or nearest neighbors. We ask if it is possible for
party A to publish some information about each user so that B can estimate the
distance between users without being able to infer any private bit of a user.
Our method involves projecting each user's representation into a random,
lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then
adding Gaussian noise to each entry of the lower-dimensional representation. We
show that the method preserves differential privacy---where the more privacy is
desired, the larger the variance of the Gaussian noise. Further, we show how to
approximate the true distances between users via only the lower-dimensional,
perturbed data. Finally, we consider other perturbation methods such as
randomized response and draw comparisons to sketch-based methods. While the
goal of releasing user-specific data to third parties is more broad than
preserving distances, this work shows that distance computations with privacy
is an achievable goal.Comment: 24 page
Differentially Private Mixture of Generative Neural Networks
Generative models are used in a wide range of applications building on large
amounts of contextually rich information. Due to possible privacy violations of
the individuals whose data is used to train these models, however, publishing
or sharing generative models is not always viable. In this paper, we present a
novel technique for privately releasing generative models and entire
high-dimensional datasets produced by these models. We model the generator
distribution of the training data with a mixture of generative neural
networks. These are trained together and collectively learn the generator
distribution of a dataset. Data is divided into clusters, using a novel
differentially private kernel -means, then each cluster is given to separate
generative neural networks, such as Restricted Boltzmann Machines or
Variational Autoencoders, which are trained only on their own cluster using
differentially private gradient descent. We evaluate our approach using the
MNIST dataset, as well as call detail records and transit datasets, showing
that it produces realistic synthetic samples, which can also be used to
accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE
International Conference on Data Mining (ICDM 2017). This is the full
version, published in IEEE Transactions on Knowledge and Data Engineering
(TKDE
- …