1 research outputs found
Privacy via the Johnson-Lindenstrauss Transform
Suppose that party A collects private information about its users, where each
user's data is represented as a bit vector. Suppose that party B has a
proprietary data mining algorithm that requires estimating the distance between
users, such as clustering or nearest neighbors. We ask if it is possible for
party A to publish some information about each user so that B can estimate the
distance between users without being able to infer any private bit of a user.
Our method involves projecting each user's representation into a random,
lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then
adding Gaussian noise to each entry of the lower-dimensional representation. We
show that the method preserves differential privacy---where the more privacy is
desired, the larger the variance of the Gaussian noise. Further, we show how to
approximate the true distances between users via only the lower-dimensional,
perturbed data. Finally, we consider other perturbation methods such as
randomized response and draw comparisons to sketch-based methods. While the
goal of releasing user-specific data to third parties is more broad than
preserving distances, this work shows that distance computations with privacy
is an achievable goal.Comment: 24 page