Search CORE

24,580 research outputs found

Privacy-preserving data publishing for cluster analysis

Author: Benjamin C.M. Fung
Beringer
Birant
Dave
Fung
Gelbard
Inan
Kaufman
Ke Wang
Lingyu Wang
Malin
McClean
Patrick C.K. Hung
Quinlan
Samarati
Sweeney
Sweeney
van Rijsbergen
Wang
Publication venue: 'Elsevier BV'
Publication date
Field of study

A Clustering-Anonymity Approach for Trajectory Data Publishing Considering both Distance and Direction

Author: Hu Ke-kun
Jiang Huo-wen
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2021
Field of study

Trajectory data contains rich spatio-temporal information of moving objects. Directly publishing it for mining and analysis will result in severe privacy disclosure problems. Most existing clustering-anonymity methods cluster trajectories according to either distance- or direction-based similarities, leading to a high information loss. To bridge this gap, in this paper, we present a clustering-anonymity approach considering both these two types of similarities. As trajectories may not be synchronized, we first design a trajectory synchronization algorithm to synchronize them. Then, two similarity metrics between trajectories are quantitatively defined, followed by a comprehensive one. Furthermore, a clustering-anonymity algorithm for trajectory data publishing with privacy-preserving is proposed. It groups trajectories into clusters according to the comprehensive similarity metric. These clusters are finally anonymized. Experimental results show that our algorithm is effective in preserving privacy with low information loss

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Privacy via the Johnson-Lindenstrauss Transform

Author: Kenthapadi Krishnaram
Korolova Aleksandra
Mironov Ilya
Mishra Nina
Publication venue: 'Journal of Privacy and Confidentiality'
Publication date: 01/01/2012
Field of study

Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Differentially Private Mixture of Generative Neural Networks

Author: Acs Gergely
Castelluccia Claude
De Cristofaro Emiliano
Melis Luca
Publication venue
Publication date: 18/11/2017
Field of study

Generative models are used in a wide range of applications building on large amounts of contextually rich information. Due to possible privacy violations of the individuals whose data is used to train these models, however, publishing or sharing generative models is not always viable. In this paper, we present a novel technique for privately releasing generative models and entire high-dimensional datasets produced by these models. We model the generator distribution of the training data with a mixture of

k

generative neural networks. These are trained together and collectively learn the generator distribution of a dataset. Data is divided into

k

clusters, using a novel differentially private kernel

k

-means, then each cluster is given to separate generative neural networks, such as Restricted Boltzmann Machines or Variational Autoencoders, which are trained only on their own cluster using differentially private gradient descent. We evaluate our approach using the MNIST dataset, as well as call detail records and transit datasets, showing that it produces realistic synthetic samples, which can also be used to accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE International Conference on Data Mining (ICDM 2017). This is the full version, published in IEEE Transactions on Knowledge and Data Engineering (TKDE

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server