1,800 research outputs found

    Lower-Cost epsilon-Private Information Retrieval

    Full text link
    Private Information Retrieval (PIR), despite being well studied, is computationally costly and arduous to scale. We explore lower-cost relaxations of information-theoretic PIR, based on dummy queries, sparse vectors, and compositions with an anonymity system. We prove the security of each scheme using a flexible differentially private definition for private queries that can capture notions of imperfect privacy. We show that basic schemes are weak, but some of them can be made arbitrarily safe by composing them with large anonymity systems

    Principled Evaluation of Differentially Private Algorithms using DPBench

    Full text link
    Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in particular, the performance of many emerging algorithms is {\em data dependent}, meaning the distribution of the noise added to query answers may change depending on the input data. Theoretical analysis typically only considers the worst case, making empirical study of average case performance increasingly important. In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. Based on these principles we propose DPBench, a novel evaluation framework for standardized evaluation of privacy algorithms. We then apply our benchmark to evaluate algorithms for answering 1- and 2-dimensional range queries. The result is a thorough empirical study of 15 published algorithms on a total of 27 datasets that offers new insights into algorithm behavior---in particular the influence of dataset scale and shape---and a more complete characterization of the state of the art. Our methodology is able to resolve inconsistencies in prior empirical studies and place algorithm performance in context through comparison to simple baselines. Finally, we pose open research questions which we hope will guide future algorithm design

    Privacy-Preserving Filtering for Event Streams

    Full text link
    Many large-scale information systems such as intelligent transportation systems, smart grids or smart buildings collect data about the activities of their users to optimize their operations. To encourage participation and adoption of these systems, it is becoming increasingly important that the design process take privacy issues into consideration. In a typical scenario, signals originate from many sensors capturing events involving the users, and several statistics of interest need to be continuously published in real-time. This paper considers the problem of providing differential privacy guarantees for such multi-input multi-output systems processing event streams. We show how to construct and optimize various extensions of the zero-forcing equalization mechanism, which we previously proposed for single-input single-output systems. Some of these extensions can take a model of the input signals into account. We illustrate our privacy-preserving filter design methodology through the problem of privately monitoring and forecasting occupancy in a building equipped with multiple motion detection sensors.Comment: This version subsumes both the previous version and arXiv:1304.231

    Answering Range Queries Under Local Differential Privacy

    Full text link
    Counting the fraction of a population having an input within a specified interval i.e. a \emph{range query}, is a fundamental data analysis primitive. Range queries can also be used to compute other interesting statistics such as \emph{quantiles}, and to build prediction models. However, frequently the data is subject to privacy concerns when it is drawn from individuals, and relates for example to their financial, health, religious or political status. In this paper, we introduce and analyze methods to support range queries under the local variant of differential privacy, an emerging standard for privacy-preserving data analysis. The local model requires that each user releases a noisy view of her private data under a privacy guarantee. While many works address the problem of range queries in the trusted aggregator setting, this problem has not been addressed specifically under untrusted aggregation (local DP) model even though many primitives have been developed recently for estimating a discrete distribution. We describe and analyze two classes of approaches for range queries, based on hierarchical histograms and the Haar wavelet transform. We show that both have strong theoretical accuracy guarantees on variance. In practice, both methods are fast and require minimal computation and communication resources. Our experiments show that the wavelet approach is most accurate in high privacy settings, while the hierarchical approach dominates for weaker privacy requirements

    Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy

    Full text link
    In social networks, different users may have different privacy preferences and there are many users with public identities. Most work on differentially private social network data publication neglects this fact. We aim to release the number of public users that a private user connects to within n hops, called n-range Connection fingerprints(CFPs), under user-level personalized privacy preferences. We proposed two schemes, Distance-based exponential budget absorption (DEBA) and Distance-based uniformly budget absorption using Ladder function (DUBA-LF), for privacy-preserving publication of the CFPs based on Personalized differential privacy(PDP), and we conducted a theoretical analysis of the privacy guarantees provided within the proposed schemes. The implementation showed that the proposed schemes are superior in publication errors on real datasets.Comment: A short version of this paper is accepted for publication in Chinese Journal of Electronic

    Optimizing Fitness-For-Use of Differentially Private Linear Queries

    Full text link
    In practice, differentially private data releases are designed to support a variety of applications. A data release is fit for use if it meets target accuracy requirements for each application. In this paper, we consider the problem of answering linear queries under differential privacy subject to per-query accuracy constraints. Existing practical frameworks like the matrix mechanism do not provide such fine-grained control (they optimize total error, which allows some query answers to be more accurate than necessary, at the expense of other queries that become no longer useful). Thus, we design a fitness-for-use strategy that adds privacy-preserving Gaussian noise to query answers. The covariance structure of the noise is optimized to meet the fine-grained accuracy requirements while minimizing the cost to privacy

    Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies

    Full text link
    Privacy definitions provide ways for trading-off the privacy of individuals in a statistical database for the utility of downstream analysis of the data. In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. In particular, we allow data publishers to extend differential privacy using a policy, which specifies (a) secrets, or information that must be kept secret, and (b) constraints that may be known about the data. While the secret specification allows increased utility by lessening protection for certain individual properties, the constraint specification provides added protection against an adversary who knows correlations in the data (arising from constraints). We formalize policies and present novel algorithms that can handle general specifications of sensitive information and certain count constraints. We show that there are reasonable policies under which our privacy mechanisms for k-means clustering, histograms and range queries introduce significantly lesser noise than their differentially private counterparts. We quantify the privacy-utility trade-offs for various policies analytically and empirically on real datasets.Comment: Full version of the paper at SIGMOD'14 Snowbird, Utah US

    Privacy-Preserving Collaborative Deep Learning with Unreliable Participants

    Full text link
    With powerful parallel computing GPUs and massive user data, neural-network-based deep learning can well exert its strong power in problem modeling and solving, and has archived great success in many applications such as image classification, speech recognition and machine translation etc. While deep learning has been increasingly popular, the problem of privacy leakage becomes more and more urgent. Given the fact that the training data may contain highly sensitive information, e.g., personal medical records, directly sharing them among the users (i.e., participants) or centrally storing them in one single location may pose a considerable threat to user privacy. In this paper, we present a practical privacy-preserving collaborative deep learning system that allows users to cooperatively build a collective deep learning model with data of all participants, without direct data sharing and central data storage. In our system, each participant trains a local model with their own data and only shares model parameters with the others. To further avoid potential privacy leakage from sharing model parameters, we use functional mechanism to perturb the objective function of the neural network in the training process to achieve ϵ\epsilon-differential privacy. In particular, for the first time, we consider the existence of~\textit{unreliable participants}, i.e., the participants with low-quality data, and propose a solution to reduce the impact of these participants while protecting their privacy. We evaluate the performance of our system on two well-known real-world datasets for regression and classification tasks. The results demonstrate that the proposed system is robust against unreliable participants, and achieves high accuracy close to the model trained in a traditional centralized manner while ensuring rigorous privacy protection

    Continuous Release of Data Streams under both Centralized and Local Differential Privacy

    Get PDF
    In this paper, we study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value can be quite large; thus it is necessary to estimate a threshold so that numbers above it are truncated to reduce the amount of noise that is required to all the data. The estimation must be done based on the data in a private fashion. We develop such a method that uses the Exponential Mechanism with a quality function that approximates well the utility goal while maintaining a low sensitivity. Given the threshold, we then propose a novel online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps into a framework for private publishing of stream data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noises to the stream, and a smoother that improves the result using post-processing. Within our framework, we design an algorithm satisfying the more stringent setting of DP called local DP (LDP). To our knowledge, this is the first LDP algorithm for publishing streaming data. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6-10 orders of magnitude in terms of utility (measured by the mean squared error of answering a random range query)

    Privately Connecting Mobility to Infectious Diseases via Applied Cryptography

    Get PDF
    Human mobility is undisputedly one of the critical factors in infectious disease dynamics. Until a few years ago, researchers had to rely on static data to model human mobility, which was then combined with a transmission model of a particular disease resulting in an epidemiological model. Recent works have consistently been showing that substituting the static mobility data with mobile phone data leads to significantly more accurate models. While prior studies have exclusively relied on a mobile network operator's subscribers' aggregated data, it may be preferable to contemplate aggregated mobility data of infected individuals only. Clearly, naively linking mobile phone data with infected individuals would massively intrude privacy. This research aims to develop a solution that reports the aggregated mobile phone location data of infected individuals while still maintaining compliance with privacy expectations. To achieve privacy, we use homomorphic encryption, zero-knowledge proof techniques, and differential privacy. Our protocol's open-source implementation can process eight million subscribers in one and a half hours. Additionally, we provide a legal analysis of our solution with regards to the EU General Data Protection Regulation.Comment: Added differentlial privacy experiments and new benchmark
    corecore