180 research outputs found
Can local particle filters beat the curse of dimensionality?
The discovery of particle filtering methods has enabled the use of nonlinear
filtering in a wide array of applications. Unfortunately, the approximation
error of particle filters typically grows exponentially in the dimension of the
underlying model. This phenomenon has rendered particle filters of limited use
in complex data assimilation problems. In this paper, we argue that it is often
possible, at least in principle, to develop local particle filtering algorithms
whose approximation error is dimension-free. The key to such developments is
the decay of correlations property, which is a spatial counterpart of the much
better understood stability property of nonlinear filters. For the simplest
possible algorithm of this type, our results provide under suitable assumptions
an approximation error bound that is uniform both in time and in the model
dimension. More broadly, our results provide a framework for the investigation
of filtering problems and algorithms in high dimension.Comment: Published at http://dx.doi.org/10.1214/14-AAP1061 in the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Locality in Network Optimization
In probability theory and statistics notions of correlation among random
variables, decay of correlation, and bias-variance trade-off are fundamental.
In this work we introduce analogous notions in optimization, and we show their
usefulness in a concrete setting. We propose a general notion of correlation
among variables in optimization procedures that is based on the sensitivity of
optimal points upon (possibly finite) perturbations. We present a canonical
instance in network optimization (the min-cost network flow problem) that
exhibits locality, i.e., a setting where the correlation decays as a function
of the graph-theoretical distance in the network. In the case of warm-start
reoptimization, we develop a general approach to localize a given optimization
routine in order to exploit locality. We show that the localization mechanism
is responsible for introducing a bias in the original algorithm, and that the
bias-variance trade-off that emerges can be exploited to minimize the
computational complexity required to reach a prescribed level of error
accuracy. We provide numerical evidence to support our claims
Phase Transitions in Nonlinear Filtering
It has been established under very general conditions that the ergodic
properties of Markov processes are inherited by their conditional distributions
given partial information. While the existing theory provides a rather complete
picture of classical filtering models, many infinite-dimensional problems are
outside its scope. Far from being a technical issue, the infinite-dimensional
setting gives rise to surprising phenomena and new questions in filtering
theory. The aim of this paper is to discuss some elementary examples,
conjectures, and general theory that arise in this setting, and to highlight
connections with problems in statistical mechanics and ergodic theory. In
particular, we exhibit a simple example of a uniformly ergodic model in which
ergodicity of the filter undergoes a phase transition, and we develop some
qualitative understanding as to when such phenomena can and cannot occur. We
also discuss closely related problems in the setting of conditional Markov
random fields.Comment: 51 page
Comparison Theorems for Gibbs Measures
The Dobrushin comparison theorem is a powerful tool to bound the difference
between the marginals of high-dimensional probability distributions in terms of
their local specifications. Originally introduced to prove uniqueness and decay
of correlations of Gibbs measures, it has been widely used in statistical
mechanics as well as in the analysis of algorithms on random fields and
interacting Markov chains. However, the classical comparison theorem requires
validity of the Dobrushin uniqueness criterion, essentially restricting its
applicability in most models to a small subset of the natural parameter space.
In this paper we develop generalized Dobrushin comparison theorems in terms of
influences between blocks of sites, in the spirit of Dobrushin-Shlosman and
Weitz, that substantially extend the range of applicability of the classical
comparison theorem. Our proofs are based on the analysis of an associated
family of Markov chains. We develop in detail an application of our main
results to the analysis of sequential Monte Carlo algorithms for filtering in
high dimension.Comment: 55 page
Accelerated Consensus via Min-Sum Splitting
We apply the Min-Sum message-passing protocol to solve the consensus problem
in distributed optimization. We show that while the ordinary Min-Sum algorithm
does not converge, a modified version of it known as Splitting yields
convergence to the problem solution. We prove that a proper choice of the
tuning parameters allows Min-Sum Splitting to yield subdiffusive accelerated
convergence rates, matching the rates obtained by shift-register methods. The
acceleration scheme embodied by Min-Sum Splitting for the consensus problem
bears similarities with lifted Markov chains techniques and with multi-step
first order methods in convex optimization
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up
We analyse the learning performance of Distributed Gradient Descent in the
context of multi-agent decentralised non-parametric regression with the square
loss function when i.i.d. samples are assigned to agents. We show that if
agents hold sufficiently many samples with respect to the network size, then
Distributed Gradient Descent achieves optimal statistical rates with a number
of iterations that scales, up to a threshold, with the inverse of the spectral
gap of the gossip matrix divided by the number of samples owned by each agent
raised to a problem-dependent power. The presence of the threshold comes from
statistics. It encodes the existence of a "big data" regime where the number of
required iterations does not depend on the network topology. In this regime,
Distributed Gradient Descent achieves optimal statistical rates with the same
order of iterations as gradient descent run with all the samples in the
network. Provided the communication delay is sufficiently small, the
distributed protocol yields a linear speed-up in runtime compared to the
single-machine protocol. This is in contrast to decentralised optimisation
algorithms that do not exploit statistics and only yield a linear speed-up in
graphs where the spectral gap is bounded away from zero. Our results exploit
the statistical concentration of quantities held by agents and shed new light
on the interplay between statistics and communication in decentralised methods.
Bounds are given in the standard non-parametric setting with source/capacity
assumptions
Decentralized Cooperative Stochastic Bandits
We study a decentralized cooperative stochastic multi-armed bandit problem
with arms on a network of agents. In our model, the reward distribution
of each arm is the same for each agent and rewards are drawn independently
across agents and time steps. In each round, each agent chooses an arm to play
and subsequently sends a message to her neighbors. The goal is to minimize the
overall regret of the entire network. We design a fully decentralized algorithm
that uses an accelerated consensus procedure to compute (delayed) estimates of
the average of rewards obtained by all the agents for each arm, and then uses
an upper confidence bound (UCB) algorithm that accounts for the delay and error
of the estimates. We analyze the regret of our algorithm and also provide a
lower bound. The regret is bounded by the optimal centralized regret plus a
natural and simple term depending on the spectral gap of the communication
matrix. Our algorithm is simpler to analyze than those proposed in prior work
and it achieves better regret bounds, while requiring less information about
the underlying network. It also performs better empirically
- …