1,800 research outputs found
Lower-Cost epsilon-Private Information Retrieval
Private Information Retrieval (PIR), despite being well studied, is
computationally costly and arduous to scale. We explore lower-cost relaxations
of information-theoretic PIR, based on dummy queries, sparse vectors, and
compositions with an anonymity system. We prove the security of each scheme
using a flexible differentially private definition for private queries that can
capture notions of imperfect privacy. We show that basic schemes are weak, but
some of them can be made arbitrarily safe by composing them with large
anonymity systems
Principled Evaluation of Differentially Private Algorithms using DPBench
Differential privacy has become the dominant standard in the research
community for strong privacy protection. There has been a flood of research
into query answering algorithms that meet this standard. Algorithms are
becoming increasingly complex, and in particular, the performance of many
emerging algorithms is {\em data dependent}, meaning the distribution of the
noise added to query answers may change depending on the input data.
Theoretical analysis typically only considers the worst case, making empirical
study of average case performance increasingly important.
In this paper we propose a set of evaluation principles which we argue are
essential for sound evaluation. Based on these principles we propose DPBench, a
novel evaluation framework for standardized evaluation of privacy algorithms.
We then apply our benchmark to evaluate algorithms for answering 1- and
2-dimensional range queries. The result is a thorough empirical study of 15
published algorithms on a total of 27 datasets that offers new insights into
algorithm behavior---in particular the influence of dataset scale and
shape---and a more complete characterization of the state of the art. Our
methodology is able to resolve inconsistencies in prior empirical studies and
place algorithm performance in context through comparison to simple baselines.
Finally, we pose open research questions which we hope will guide future
algorithm design
Privacy-Preserving Filtering for Event Streams
Many large-scale information systems such as intelligent transportation
systems, smart grids or smart buildings collect data about the activities of
their users to optimize their operations. To encourage participation and
adoption of these systems, it is becoming increasingly important that the
design process take privacy issues into consideration. In a typical scenario,
signals originate from many sensors capturing events involving the users, and
several statistics of interest need to be continuously published in real-time.
This paper considers the problem of providing differential privacy guarantees
for such multi-input multi-output systems processing event streams. We show how
to construct and optimize various extensions of the zero-forcing equalization
mechanism, which we previously proposed for single-input single-output systems.
Some of these extensions can take a model of the input signals into account. We
illustrate our privacy-preserving filter design methodology through the problem
of privately monitoring and forecasting occupancy in a building equipped with
multiple motion detection sensors.Comment: This version subsumes both the previous version and arXiv:1304.231
Answering Range Queries Under Local Differential Privacy
Counting the fraction of a population having an input within a specified
interval i.e. a \emph{range query}, is a fundamental data analysis primitive.
Range queries can also be used to compute other interesting statistics such as
\emph{quantiles}, and to build prediction models. However, frequently the data
is subject to privacy concerns when it is drawn from individuals, and relates
for example to their financial, health, religious or political status. In this
paper, we introduce and analyze methods to support range queries under the
local variant of differential privacy, an emerging standard for
privacy-preserving data analysis.
The local model requires that each user releases a noisy view of her private
data under a privacy guarantee. While many works address the problem of range
queries in the trusted aggregator setting, this problem has not been addressed
specifically under untrusted aggregation (local DP) model even though many
primitives have been developed recently for estimating a discrete distribution.
We describe and analyze two classes of approaches for range queries, based on
hierarchical histograms and the Haar wavelet transform. We show that both have
strong theoretical accuracy guarantees on variance. In practice, both methods
are fast and require minimal computation and communication resources. Our
experiments show that the wavelet approach is most accurate in high privacy
settings, while the hierarchical approach dominates for weaker privacy
requirements
Release Connection Fingerprints in Social Networks Using Personalized Differential Privacy
In social networks, different users may have different privacy preferences
and there are many users with public identities. Most work on differentially
private social network data publication neglects this fact. We aim to release
the number of public users that a private user connects to within n hops,
called n-range Connection fingerprints(CFPs), under user-level personalized
privacy preferences. We proposed two schemes, Distance-based exponential budget
absorption (DEBA) and Distance-based uniformly budget absorption using Ladder
function (DUBA-LF), for privacy-preserving publication of the CFPs based on
Personalized differential privacy(PDP), and we conducted a theoretical analysis
of the privacy guarantees provided within the proposed schemes. The
implementation showed that the proposed schemes are superior in publication
errors on real datasets.Comment: A short version of this paper is accepted for publication in Chinese
Journal of Electronic
Optimizing Fitness-For-Use of Differentially Private Linear Queries
In practice, differentially private data releases are designed to support a
variety of applications. A data release is fit for use if it meets target
accuracy requirements for each application. In this paper, we consider the
problem of answering linear queries under differential privacy subject to
per-query accuracy constraints. Existing practical frameworks like the matrix
mechanism do not provide such fine-grained control (they optimize total error,
which allows some query answers to be more accurate than necessary, at the
expense of other queries that become no longer useful). Thus, we design a
fitness-for-use strategy that adds privacy-preserving Gaussian noise to query
answers. The covariance structure of the noise is optimized to meet the
fine-grained accuracy requirements while minimizing the cost to privacy
Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies
Privacy definitions provide ways for trading-off the privacy of individuals
in a statistical database for the utility of downstream analysis of the data.
In this paper, we present Blowfish, a class of privacy definitions inspired by
the Pufferfish framework, that provides a rich interface for this trade-off. In
particular, we allow data publishers to extend differential privacy using a
policy, which specifies (a) secrets, or information that must be kept secret,
and (b) constraints that may be known about the data. While the secret
specification allows increased utility by lessening protection for certain
individual properties, the constraint specification provides added protection
against an adversary who knows correlations in the data (arising from
constraints). We formalize policies and present novel algorithms that can
handle general specifications of sensitive information and certain count
constraints. We show that there are reasonable policies under which our privacy
mechanisms for k-means clustering, histograms and range queries introduce
significantly lesser noise than their differentially private counterparts. We
quantify the privacy-utility trade-offs for various policies analytically and
empirically on real datasets.Comment: Full version of the paper at SIGMOD'14 Snowbird, Utah US
Privacy-Preserving Collaborative Deep Learning with Unreliable Participants
With powerful parallel computing GPUs and massive user data,
neural-network-based deep learning can well exert its strong power in problem
modeling and solving, and has archived great success in many applications such
as image classification, speech recognition and machine translation etc. While
deep learning has been increasingly popular, the problem of privacy leakage
becomes more and more urgent. Given the fact that the training data may contain
highly sensitive information, e.g., personal medical records, directly sharing
them among the users (i.e., participants) or centrally storing them in one
single location may pose a considerable threat to user privacy.
In this paper, we present a practical privacy-preserving collaborative deep
learning system that allows users to cooperatively build a collective deep
learning model with data of all participants, without direct data sharing and
central data storage. In our system, each participant trains a local model with
their own data and only shares model parameters with the others. To further
avoid potential privacy leakage from sharing model parameters, we use
functional mechanism to perturb the objective function of the neural network in
the training process to achieve -differential privacy. In particular,
for the first time, we consider the existence of~\textit{unreliable
participants}, i.e., the participants with low-quality data, and propose a
solution to reduce the impact of these participants while protecting their
privacy. We evaluate the performance of our system on two well-known real-world
datasets for regression and classification tasks. The results demonstrate that
the proposed system is robust against unreliable participants, and achieves
high accuracy close to the model trained in a traditional centralized manner
while ensuring rigorous privacy protection
Continuous Release of Data Streams under both Centralized and Local Differential Privacy
In this paper, we study the problem of publishing a stream of real-valued
data satisfying differential privacy (DP). One major challenge is that the
maximal possible value can be quite large; thus it is necessary to estimate a
threshold so that numbers above it are truncated to reduce the amount of noise
that is required to all the data. The estimation must be done based on the data
in a private fashion. We develop such a method that uses the Exponential
Mechanism with a quality function that approximates well the utility goal while
maintaining a low sensitivity. Given the threshold, we then propose a novel
online hierarchical method and several post-processing techniques.
Building on these ideas, we formalize the steps into a framework for private
publishing of stream data. Our framework consists of three components: a
threshold optimizer that privately estimates the threshold, a perturber that
adds calibrated noises to the stream, and a smoother that improves the result
using post-processing. Within our framework, we design an algorithm satisfying
the more stringent setting of DP called local DP (LDP). To our knowledge, this
is the first LDP algorithm for publishing streaming data. Using four real-world
datasets, we demonstrate that our mechanism outperforms the state-of-the-art by
a factor of 6-10 orders of magnitude in terms of utility (measured by the mean
squared error of answering a random range query)
Privately Connecting Mobility to Infectious Diseases via Applied Cryptography
Human mobility is undisputedly one of the critical factors in infectious
disease dynamics. Until a few years ago, researchers had to rely on static data
to model human mobility, which was then combined with a transmission model of a
particular disease resulting in an epidemiological model. Recent works have
consistently been showing that substituting the static mobility data with
mobile phone data leads to significantly more accurate models. While prior
studies have exclusively relied on a mobile network operator's subscribers'
aggregated data, it may be preferable to contemplate aggregated mobility data
of infected individuals only. Clearly, naively linking mobile phone data with
infected individuals would massively intrude privacy. This research aims to
develop a solution that reports the aggregated mobile phone location data of
infected individuals while still maintaining compliance with privacy
expectations. To achieve privacy, we use homomorphic encryption, zero-knowledge
proof techniques, and differential privacy. Our protocol's open-source
implementation can process eight million subscribers in one and a half hours.
Additionally, we provide a legal analysis of our solution with regards to the
EU General Data Protection Regulation.Comment: Added differentlial privacy experiments and new benchmark
- …