7,145 research outputs found
Measuring Membership Privacy on Aggregate Location Time-Series
While location data is extremely valuable for various applications,
disclosing it prompts serious threats to individuals' privacy. To limit such
concerns, organizations often provide analysts with aggregate time-series that
indicate, e.g., how many people are in a location at a time interval, rather
than raw individual traces. In this paper, we perform a measurement study to
understand Membership Inference Attacks (MIAs) on aggregate location
time-series, where an adversary tries to infer whether a specific user
contributed to the aggregates.
We find that the volume of contributed data, as well as the regularity and
particularity of users' mobility patterns, play a crucial role in the attack's
success. We experiment with a wide range of defenses based on generalization,
hiding, and perturbation, and evaluate their ability to thwart the attack
vis-a-vis the utility loss they introduce for various mobility analytics tasks.
Our results show that some defenses fail across the board, while others work
for specific tasks on aggregate location time-series. For instance, suppressing
small counts can be used for ranking hotspots, data generalization for
forecasting traffic, hotspot discovery, and map inference, while sampling is
effective for location labeling and anomaly detection when the dataset is
sparse. Differentially private techniques provide reasonable accuracy only in
very specific settings, e.g., discovering hotspots and forecasting their
traffic, and more so when using weaker privacy notions like crowd-blending
privacy. Overall, our measurements show that there does not exist a unique
generic defense that can preserve the utility of the analytics for arbitrary
applications, and provide useful insights regarding the disclosure of sanitized
aggregate location time-series
Iterated filtering methods for Markov process epidemic models
Dynamic epidemic models have proven valuable for public health decision
makers as they provide useful insights into the understanding and prevention of
infectious diseases. However, inference for these types of models can be
difficult because the disease spread is typically only partially observed e.g.
in form of reported incidences in given time periods. This chapter discusses
how to perform likelihood-based inference for partially observed Markov
epidemic models when it is relatively easy to generate samples from the Markov
transmission model while the likelihood function is intractable. The first part
of the chapter reviews the theoretical background of inference for partially
observed Markov processes (POMP) via iterated filtering. In the second part of
the chapter the performance of the method and associated practical difficulties
are illustrated on two examples. In the first example a simulated outbreak data
set consisting of the number of newly reported cases aggregated by week is
fitted to a POMP where the underlying disease transmission model is assumed to
be a simple Markovian SIR model. The second example illustrates possible model
extensions such as seasonal forcing and over-dispersion in both, the
transmission and observation model, which can be used, e.g., when analysing
routinely collected rotavirus surveillance data. Both examples are implemented
using the R-package pomp (King et al., 2016) and the code is made available
online.Comment: This manuscript is a preprint of a chapter to appear in the Handbook
of Infectious Disease Data Analysis, Held, L., Hens, N., O'Neill, P.D. and
Wallinga, J. (Eds.). Chapman \& Hall/CRC, 2018. Please use the book for
possible citations. Corrected typo in the references and modified second
exampl
Scalable Inference for Markov Processes with Intractable Likelihoods
Bayesian inference for Markov processes has become increasingly relevant in
recent years. Problems of this type often have intractable likelihoods and
prior knowledge about model rate parameters is often poor. Markov Chain Monte
Carlo (MCMC) techniques can lead to exact inference in such models but in
practice can suffer performance issues including long burn-in periods and poor
mixing. On the other hand approximate Bayesian computation techniques can allow
rapid exploration of a large parameter space but yield only approximate
posterior distributions. Here we consider the combined use of approximate
Bayesian computation (ABC) and MCMC techniques for improved computational
efficiency while retaining exact inference on parallel hardware
Pseudospectral Model Predictive Control under Partially Learned Dynamics
Trajectory optimization of a controlled dynamical system is an essential part
of autonomy, however many trajectory optimization techniques are limited by the
fidelity of the underlying parametric model. In the field of robotics, a lack
of model knowledge can be overcome with machine learning techniques, utilizing
measurements to build a dynamical model from the data. This paper aims to take
the middle ground between these two approaches by introducing a semi-parametric
representation of the underlying system dynamics. Our goal is to leverage the
considerable information contained in a traditional physics based model and
combine it with a data-driven, non-parametric regression technique known as a
Gaussian Process. Integrating this semi-parametric model with model predictive
pseudospectral control, we demonstrate this technique on both a cart pole and
quadrotor simulation with unmodeled damping and parametric error. In order to
manage parametric uncertainty, we introduce an algorithm that utilizes Sparse
Spectrum Gaussian Processes (SSGP) for online learning after each rollout. We
implement this online learning technique on a cart pole and quadrator, then
demonstrate the use of online learning and obstacle avoidance for the dubin
vehicle dynamics.Comment: Accepted but withdrawn from AIAA Scitech 201
Mathematical techniques for the protection of patient's privacy in medical databases
In modern society, keeping the balance between privacy and public access to information is becoming a widespread problem more and more often. Valid data is crucial for many kinds of research, but the public good should not be achieved at the expense of individuals.
While creating a central database of patients, the CSIOZ wishes to provide statistical information for selected institutions. However, there are some plans to extend the access by providing the statistics to researchers or even to citizens. This might pose a significant risk of disclosure of some private, sensitive information about
individuals. This report proposes some methods to prevent data leaks.
One category of suggestions is based on the idea of modifying statistics, so that they would maintain importance for statisticians and at the same time guarantee the protection of patient's privacy.
Another group of proposed mechanisms, though sometimes difficult to implement, enables one to obtain precise statistics, while restricting such queries which might reveal sensitive information
A Stein variational Newton method
Stein variational gradient descent (SVGD) was recently proposed as a general
purpose nonparametric variational inference algorithm [Liu & Wang, NIPS 2016]:
it minimizes the Kullback-Leibler divergence between the target distribution
and its approximation by implementing a form of functional gradient descent on
a reproducing kernel Hilbert space. In this paper, we accelerate and generalize
the SVGD algorithm by including second-order information, thereby approximating
a Newton-like iteration in function space. We also show how second-order
information can lead to more effective choices of kernel. We observe
significant computational gains over the original SVGD algorithm in multiple
test cases.Comment: 18 pages, 7 figure
- …