2,712 research outputs found

    Distributed Private Heavy Hitters

    Full text link
    In this paper, we give efficient algorithms and lower bounds for solving the heavy hitters problem while preserving differential privacy in the fully distributed local model. In this model, there are n parties, each of which possesses a single element from a universe of size N. The heavy hitters problem is to find the identity of the most common element shared amongst the n parties. In the local model, there is no trusted database administrator, and so the algorithm must interact with each of the nn parties separately, using a differentially private protocol. We give tight information-theoretic upper and lower bounds on the accuracy to which this problem can be solved in the local model (giving a separation between the local model and the more common centralized model of privacy), as well as computationally efficient algorithms even in the case where the data universe N may be exponentially large

    Privacy via the Johnson-Lindenstrauss Transform

    Full text link
    Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal.Comment: 24 page

    Near-Optimal Algorithms for Differentially-Private Principal Components

    Full text link
    Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 201

    Privacy-Compatibility For General Utility Metrics

    Get PDF
    In this note, we present a complete characterization of the utility metrics that allow for non-trivial differential privacy guarantees

    Random projection to preserve patient privacy

    Get PDF
    With the availability of accessible and widely used cloud services, it is natural that large components of healthcare systems migrate to them; for example, patient databases can be stored and processed in the cloud. Such cloud services provide enhanced flexibility and additional gains, such as availability, ease of data share, and so on. This trend poses serious threats regarding the privacy of the patients and the trust that an individual must put into the healthcare system itself. Thus, there is a strong need of privacy preservation, achieved through a variety of different approaches. In this paper, we study the application of a random projection-based approach to patient data as a means to achieve two goals: (1) provably mask the identity of users under some adversarial-attack settings, (2) preserve enough information to allow for aggregate data analysis and application of machine-learning techniques. As far as we know, such approaches have not been applied and tested on medical data. We analyze the tradeoff between the loss of accuracy on the outcome of machine-learning algorithms and the resilience against an adversary. We show that random projections proved to be strong against known input/output attacks while offering high quality data, as long as the projected space is smaller than the original space, and as long as the amount of leaked data available to the adversary is limited
    • …
    corecore