9,099 research outputs found

    Variable length Markov chains and dynamical sources

    Full text link
    Infinite random sequences of letters can be viewed as stochastic chains or as strings produced by a source, in the sense of information theory. The relationship between Variable Length Markov Chains (VLMC) and probabilistic dynamical sources is studied. We establish a probabilistic frame for context trees and VLMC and we prove that any VLMC is a dynamical source for which we explicitly build the mapping. On two examples, the ``comb'' and the ``bamboo blossom'', we find a necessary and sufficient condition for the existence and the unicity of a stationary probability measure for the VLMC. These two examples are detailed in order to provide the associated Dirichlet series as well as the generating functions of word occurrences.Comment: 45 pages, 15 figure

    On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source

    Get PDF
    In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations

    Spike trains statistics in Integrate and Fire Models: exact results

    Get PDF
    We briefly review and highlight the consequences of rigorous and exact results obtained in \cite{cessac:10}, characterizing the statistics of spike trains in a network of leaky Integrate-and-Fire neurons, where time is discrete and where neurons are subject to noise, without restriction on the synaptic weights connectivity. The main result is that spike trains statistics are characterized by a Gibbs distribution, whose potential is explicitly computable. This establishes, on one hand, a rigorous ground for the current investigations attempting to characterize real spike trains data with Gibbs distributions, such as the Ising-like distribution, using the maximal entropy principle. However, it transpires from the present analysis that the Ising model might be a rather weak approximation. Indeed, the Gibbs potential (the formal "Hamiltonian") is the log of the so-called "conditional intensity" (the probability that a neuron fires given the past of the whole network). But, in the present example, this probability has an infinite memory, and the corresponding process is non-Markovian (resp. the Gibbs potential has infinite range). Moreover, causality implies that the conditional intensity does not depend on the state of the neurons at the \textit{same time}, ruling out the Ising model as a candidate for an exact characterization of spike trains statistics. However, Markovian approximations can be proposed whose degree of approximation can be rigorously controlled. In this setting, Ising model appears as the "next step" after the Bernoulli model (independent neurons) since it introduces spatial pairwise correlations, but not time correlations. The range of validity of this approximation is discussed together with possible approaches allowing to introduce time correlations, with algorithmic extensions.Comment: 6 pages, submitted to conference NeuroComp2010 http://2010.neurocomp.fr/; Bruno Cessac http://www-sop.inria.fr/neuromathcomp

    Sparse approaches for the exact distribution of patterns in long state sequences generated by a Markov source

    Get PDF
    We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem and are two-part algorithms. The first approach relies on a partial recursion after a fast computation of the second largest eigenvalue of the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy-example and two biological applications: the transcription factors of the Human Chromosome 5 and the PROSITE signatures of functional motifs in proteins. On these example our methods demonstrate their complementarity and their hability to extend the domain of feasibility for exact computations in pattern problems to a new level

    The impact of temporal sampling resolution on parameter inference for biological transport models

    Full text link
    Imaging data has become widely available to study biological systems at various scales, for example the motile behaviour of bacteria or the transport of mRNA, and it has the potential to transform our understanding of key transport mechanisms. Often these imaging studies require us to compare biological species or mutants, and to do this we need to quantitatively characterise their behaviour. Mathematical models offer a quantitative description of a system that enables us to perform this comparison, but to relate these mechanistic mathematical models to imaging data, we need to estimate the parameters of the models. In this work, we study the impact of collecting data at different temporal resolutions on parameter inference for biological transport models by performing exact inference for simple velocity jump process models in a Bayesian framework. This issue is prominent in a host of studies because the majority of imaging technologies place constraints on the frequency with which images can be collected, and the discrete nature of observations can introduce errors into parameter estimates. In this work, we avoid such errors by formulating the velocity jump process model within a hidden states framework. This allows us to obtain estimates of the reorientation rate and noise amplitude for noisy observations of a simple velocity jump process. We demonstrate the sensitivity of these estimates to temporal variations in the sampling resolution and extent of measurement noise. We use our methodology to provide experimental guidelines for researchers aiming to characterise motile behaviour that can be described by a velocity jump process. In particular, we consider how experimental constraints resulting in a trade-off between temporal sampling resolution and observation noise may affect parameter estimates.Comment: Published in PLOS Computational Biolog

    Distinguishing Hidden Markov Chains

    Full text link
    Hidden Markov Chains (HMCs) are commonly used mathematical models of probabilistic systems. They are employed in various fields such as speech recognition, signal processing, and biological sequence analysis. We consider the problem of distinguishing two given HMCs based on an observation sequence that one of the HMCs generates. More precisely, given two HMCs and an observation sequence, a distinguishing algorithm is expected to identify the HMC that generates the observation sequence. Two HMCs are called distinguishable if for every ε>0\varepsilon > 0 there is a distinguishing algorithm whose error probability is less than ε\varepsilon. We show that one can decide in polynomial time whether two HMCs are distinguishable. Further, we present and analyze two distinguishing algorithms for distinguishable HMCs. The first algorithm makes a decision after processing a fixed number of observations, and it exhibits two-sided error. The second algorithm processes an unbounded number of observations, but the algorithm has only one-sided error. The error probability, for both algorithms, decays exponentially with the number of processed observations. We also provide an algorithm for distinguishing multiple HMCs. Finally, we discuss an application in stochastic runtime verification.Comment: This is the full version of a LICS'16 pape

    Entropy-based parametric estimation of spike train statistics

    Full text link
    We consider the evolution of a network of neurons, focusing on the asymptotic behavior of spikes dynamics instead of membrane potential dynamics. The spike response is not sought as a deterministic response in this context, but as a conditional probability : "Reading out the code" consists of inferring such a probability. This probability is computed from empirical raster plots, by using the framework of thermodynamic formalism in ergodic theory. This gives us a parametric statistical model where the probability has the form of a Gibbs distribution. In this respect, this approach generalizes the seminal and profound work of Schneidman and collaborators. A minimal presentation of the formalism is reviewed here, while a general algorithmic estimation method is proposed yielding fast convergent implementations. It is also made explicit how several spike observables (entropy, rate, synchronizations, correlations) are given in closed-form from the parametric estimation. This paradigm does not only allow us to estimate the spike statistics, given a design choice, but also to compare different models, thus answering comparative questions about the neural code such as : "are correlations (or time synchrony or a given set of spike patterns, ..) significant with respect to rate coding only ?" A numerical validation of the method is proposed and the perspectives regarding spike-train code analysis are also discussed.Comment: 37 pages, 8 figures, submitte
    • …
    corecore