94 research outputs found

    On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source

    Get PDF
    In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations

    Fixed effects selection in the linear mixed-effects model using adaptive ridge procedure for L0 penalty performance

    Full text link
    This paper is concerned with the selection of fixed effects along with the estimation of fixed effects, random effects and variance components in the linear mixed-effects model. We introduce a selection procedure based on an adaptive ridge (AR) penalty of the profiled likelihood, where the covariance matrix of the random effects is Cholesky factorized. This selection procedure is intended to both low and high-dimensional settings where the number of fixed effects is allowed to grow exponentially with the total sample size, yielding technical difficulties due to the non-convex optimization problem induced by L0 penalties. Through extensive simulation studies, the procedure is compared to the LASSO selection and appears to enjoy the model selection consistency as well as the estimation consistency

    Profiled deviance for the multivariate linear mixed-effects model fitting

    Full text link
    This paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the marginal residual terms are assumed uncorrelated and homoscedastic with possibly different standard deviations. The random effects covariance matrix is Cholesky factorized to directly estimate the variance components of these random effects. This strategy enables a consistent estimate of the random effects covariance matrix which, generally, has a poor estimate when it is grossly (or directly) estimated, using the estimating methods such as the EM algorithm. By using simulated data sets, we compare the estimates based on the present method with the EM algorithm-based estimates. We provide an illustration by using the real-life data concerning the study of the child's immune against malaria in Benin (West Africa)

    Pattern statistics on Markov chains and sensitivity to parameter estimation

    Get PDF
    BACKGROUND: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). RESULTS: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. CONCLUSION: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation

    A sum-product algorithm with polynomials for computing exact derivatives of the likelihood in Bayesian networks

    Get PDF
    International audienceWe consider a Bayesian network with a parameter θ. It is well known that the probability of an evidence conditional on θ (the likelihood) can be computed through a sum-product of potentials. In this work we propose a polynomial version of the sum-product algorithm based on generating functions for computing both the likelihood function and all its exact derivatives. For a unidimensional parameter we obtain the derivatives up to order d with a complexity O(C × d 2) where C is the complexity for computing the likelihood alone. For a parameter of p dimensions we obtain the likelihood, the gradient and the Hessian with a complexity O(C × p 2). These complexities are similar to the numerical method with the main advantage that it computes exact derivatives instead of approximations
    corecore