438 research outputs found

### Efficient Bayesian Inference for Generalized Bradley-Terry Models

The Bradley-Terry model is a popular approach to describe probabilities of
the possible outcomes when elements of a set are repeatedly compared with one
another in pairs. It has found many applications including animal behaviour,
chess ranking and multiclass classification. Numerous extensions of the basic
model have also been proposed in the literature including models with ties,
multiple comparisons, group comparisons and random graphs. From a computational
point of view, Hunter (2004) has proposed efficient iterative MM
(minorization-maximization) algorithms to perform maximum likelihood estimation
for these generalized Bradley-Terry models whereas Bayesian inference is
typically performed using MCMC (Markov chain Monte Carlo) algorithms based on
tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\
algorithms can be reinterpreted as special instances of
Expectation-Maximization (EM) algorithms associated to suitable sets of latent
variables and propose some original extensions. These latent variables allow us
to derive simple Gibbs samplers for Bayesian inference. We demonstrate
experimentally the efficiency of these algorithms on a variety of applications

### Autoregressive Kernels For Time Series

We propose in this work a new family of kernels for variable-length time
series. Our work builds upon the vector autoregressive (VAR) model for
multivariate stochastic processes: given a multivariate time series x, we
consider the likelihood function p_{\theta}(x) of different parameters \theta
in the VAR model as features to describe x. To compare two time series x and
x', we form the product of their features p_{\theta}(x) p_{\theta}(x') which is
integrated out w.r.t \theta using a matrix normal-inverse Wishart prior. Among
other properties, this kernel can be easily computed when the dimension d of
the time series is much larger than the lengths of the considered time series x
and x'. It can also be generalized to time series taking values in arbitrary
state spaces, as long as the state space itself is endowed with a kernel
\kappa. In that case, the kernel between x and x' is a a function of the Gram
matrices produced by \kappa on observations and subsequences of observations
enumerated in x and x'. We describe a computationally efficient implementation
of this generalization that uses low-rank matrix factorization techniques.
These kernels are compared to other known kernels using a set of benchmark
classification tasks carried out with support vector machines

### Asymptotic Bias of Stochastic Gradient Search

The asymptotic behavior of the stochastic gradient algorithm with a biased
gradient estimator is analyzed. Relying on arguments based on the dynamic
system theory (chain-recurrence) and the differential geometry (Yomdin theorem
and Lojasiewicz inequality), tight bounds on the asymptotic bias of the
iterates generated by such an algorithm are derived. The obtained results hold
under mild conditions and cover a broad class of high-dimensional nonlinear
algorithms. Using these results, the asymptotic properties of the
policy-gradient (reinforcement) learning and adaptive population Monte Carlo
sampling are studied. Relying on the same results, the asymptotic behavior of
the recursive maximum split-likelihood estimation in hidden Markov models is
analyzed, too.Comment: arXiv admin note: text overlap with arXiv:0907.102

### Replica Conditional Sequential Monte Carlo

We propose a Markov chain Monte Carlo (MCMC) scheme to perform state
inference in non-linear non-Gaussian state-space models. Current
state-of-the-art methods to address this problem rely on particle MCMC
techniques and its variants, such as the iterated conditional Sequential Monte
Carlo (cSMC) scheme, which uses a Sequential Monte Carlo (SMC) type proposal
within MCMC. A deficiency of standard SMC proposals is that they only use
observations up to time $t$ to propose states at time $t$ when an entire
observation sequence is available. More sophisticated SMC based on lookahead
techniques could be used but they can be difficult to put in practice. We
propose here replica cSMC where we build SMC proposals for one replica using
information from the entire observation sequence by conditioning on the states
of the other replicas. This approach is easily parallelizable and we
demonstrate its excellent empirical performance when compared to the standard
iterated cSMC scheme at fixed computational complexity.Comment: To appear in Proceedings of ICML '1

### Interacting Markov chain Monte Carlo methods for solving nonlinear measure-valued equations

We present a new class of interacting Markov chain Monte Carlo algorithms for
solving numerically discrete-time measure-valued equations. The associated
stochastic processes belong to the class of self-interacting Markov chains. In
contrast to traditional Markov chains, their time evolutions depend on the
occupation measure of their past values. This general methodology allows us to
provide a natural way to sample from a sequence of target probability measures
of increasing complexity. We develop an original theoretical analysis to
analyze the behavior of these iterative algorithms which relies on
measure-valued processes and semigroup techniques. We establish a variety of
convergence results including exponential estimates and a uniform convergence
theorem with respect to the number of target distributions. We also illustrate
these algorithms in the context of Feynman-Kac distribution flows.Comment: Published in at http://dx.doi.org/10.1214/09-AAP628 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org

### Analyticity of Entropy Rates of Continuous-State Hidden Markov Models

The analyticity of the entropy and relative entropy rates of continuous-state
hidden Markov models is studied here. Using the analytic continuation principle
and the stability properties of the optimal filter, the analyticity of these
rates is shown for analytically parameterized models. The obtained results hold
under relatively mild conditions and cover several classes of hidden Markov
models met in practice. These results are relevant for several (theoretically
and practically) important problems arising in statistical inference, system
identification and information theory

### Gibbs flow for approximate transport with applications to Bayesian computation

Let $\pi_{0}$ and $\pi_{1}$ be two distributions on the Borel space
$(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d}))$. Any measurable function
$T:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ such that $Y=T(X)\sim\pi_{1}$ if
$X\sim\pi_{0}$ is called a transport map from $\pi_{0}$ to $\pi_{1}$. For any
$\pi_{0}$ and $\pi_{1}$, if one could obtain an analytical expression for a
transport map from $\pi_{0}$ to $\pi_{1}$, then this could be straightforwardly
applied to sample from any distribution. One would map draws from an
easy-to-sample distribution $\pi_{0}$ to the target distribution $\pi_{1}$
using this transport map. Although it is usually impossible to obtain an
explicit transport map for complex target distributions, we show here how to
build a tractable approximation of a novel transport map. This is achieved by
moving samples from $\pi_{0}$ using an ordinary differential equation with a
velocity field that depends on the full conditional distributions of the
target. Even when this ordinary differential equation is time-discretized and
the full conditional distributions are numerically approximated, the resulting
distribution of mapped samples can be efficiently evaluated and used as a
proposal within sequential Monte Carlo samplers. We demonstrate significant
gains over state-of-the-art sequential Monte Carlo samplers at a fixed
computational complexity on a variety of applications.Comment: Significantly revised with new methodology and numerical example

- â€¦