12,477 research outputs found

    Efficient Bayesian Inference for Generalized Bradley-Terry Models

    Full text link
    The Bradley-Terry model is a popular approach to describe probabilities of the possible outcomes when elements of a set are repeatedly compared with one another in pairs. It has found many applications including animal behaviour, chess ranking and multiclass classification. Numerous extensions of the basic model have also been proposed in the literature including models with ties, multiple comparisons, group comparisons and random graphs. From a computational point of view, Hunter (2004) has proposed efficient iterative MM (minorization-maximization) algorithms to perform maximum likelihood estimation for these generalized Bradley-Terry models whereas Bayesian inference is typically performed using MCMC (Markov chain Monte Carlo) algorithms based on tailored Metropolis-Hastings (M-H) proposals. We show here that these MM\ algorithms can be reinterpreted as special instances of Expectation-Maximization (EM) algorithms associated to suitable sets of latent variables and propose some original extensions. These latent variables allow us to derive simple Gibbs samplers for Bayesian inference. We demonstrate experimentally the efficiency of these algorithms on a variety of applications

    Autoregressive Kernels For Time Series

    Full text link
    We propose in this work a new family of kernels for variable-length time series. Our work builds upon the vector autoregressive (VAR) model for multivariate stochastic processes: given a multivariate time series x, we consider the likelihood function p_{\theta}(x) of different parameters \theta in the VAR model as features to describe x. To compare two time series x and x', we form the product of their features p_{\theta}(x) p_{\theta}(x') which is integrated out w.r.t \theta using a matrix normal-inverse Wishart prior. Among other properties, this kernel can be easily computed when the dimension d of the time series is much larger than the lengths of the considered time series x and x'. It can also be generalized to time series taking values in arbitrary state spaces, as long as the state space itself is endowed with a kernel \kappa. In that case, the kernel between x and x' is a a function of the Gram matrices produced by \kappa on observations and subsequences of observations enumerated in x and x'. We describe a computationally efficient implementation of this generalization that uses low-rank matrix factorization techniques. These kernels are compared to other known kernels using a set of benchmark classification tasks carried out with support vector machines

    Asymptotic Bias of Stochastic Gradient Search

    Get PDF
    The asymptotic behavior of the stochastic gradient algorithm with a biased gradient estimator is analyzed. Relying on arguments based on the dynamic system theory (chain-recurrence) and the differential geometry (Yomdin theorem and Lojasiewicz inequality), tight bounds on the asymptotic bias of the iterates generated by such an algorithm are derived. The obtained results hold under mild conditions and cover a broad class of high-dimensional nonlinear algorithms. Using these results, the asymptotic properties of the policy-gradient (reinforcement) learning and adaptive population Monte Carlo sampling are studied. Relying on the same results, the asymptotic behavior of the recursive maximum split-likelihood estimation in hidden Markov models is analyzed, too.Comment: arXiv admin note: text overlap with arXiv:0907.102

    Replica Conditional Sequential Monte Carlo

    Get PDF
    We propose a Markov chain Monte Carlo (MCMC) scheme to perform state inference in non-linear non-Gaussian state-space models. Current state-of-the-art methods to address this problem rely on particle MCMC techniques and its variants, such as the iterated conditional Sequential Monte Carlo (cSMC) scheme, which uses a Sequential Monte Carlo (SMC) type proposal within MCMC. A deficiency of standard SMC proposals is that they only use observations up to time tt to propose states at time tt when an entire observation sequence is available. More sophisticated SMC based on lookahead techniques could be used but they can be difficult to put in practice. We propose here replica cSMC where we build SMC proposals for one replica using information from the entire observation sequence by conditioning on the states of the other replicas. This approach is easily parallelizable and we demonstrate its excellent empirical performance when compared to the standard iterated cSMC scheme at fixed computational complexity.Comment: To appear in Proceedings of ICML '1

    Interacting Markov chain Monte Carlo methods for solving nonlinear measure-valued equations

    Get PDF
    We present a new class of interacting Markov chain Monte Carlo algorithms for solving numerically discrete-time measure-valued equations. The associated stochastic processes belong to the class of self-interacting Markov chains. In contrast to traditional Markov chains, their time evolutions depend on the occupation measure of their past values. This general methodology allows us to provide a natural way to sample from a sequence of target probability measures of increasing complexity. We develop an original theoretical analysis to analyze the behavior of these iterative algorithms which relies on measure-valued processes and semigroup techniques. We establish a variety of convergence results including exponential estimates and a uniform convergence theorem with respect to the number of target distributions. We also illustrate these algorithms in the context of Feynman-Kac distribution flows.Comment: Published in at http://dx.doi.org/10.1214/09-AAP628 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Analyticity of Entropy Rates of Continuous-State Hidden Markov Models

    Full text link
    The analyticity of the entropy and relative entropy rates of continuous-state hidden Markov models is studied here. Using the analytic continuation principle and the stability properties of the optimal filter, the analyticity of these rates is shown for analytically parameterized models. The obtained results hold under relatively mild conditions and cover several classes of hidden Markov models met in practice. These results are relevant for several (theoretically and practically) important problems arising in statistical inference, system identification and information theory

    Bias of Particle Approximations to Optimal Filter Derivative

    Full text link
    In many applications, a state-space model depends on a parameter which needs to be inferred from a data set. Quite often, it is necessary to perform the parameter inference online. In the maximum likelihood approach, this can be done using stochastic gradient search and the optimal filter derivative. However, the optimal filter and its derivative are not analytically tractable for a non-linear state-space model and need to be approximated numerically. In [Poyiadjis, Doucet and Singh, Biometrika 2011], a particle approximation to the optimal filter derivative has been proposed, while the corresponding LpL_{p} error bonds and the central limit theorem have been provided in [Del Moral, Doucet and Singh, SIAM Journal on Control and Optimization 2015]. Here, the bias of this particle approximation is analyzed. We derive (relatively) tight bonds on the bias in terms of the number of particles. Under (strong) mixing conditions, the bounds are uniform in time and inversely proportional to the number of particles. The obtained results apply to a (relatively) broad class of state-space models met in practice
    • …
    corecore