9,099 research outputs found
Variable length Markov chains and dynamical sources
Infinite random sequences of letters can be viewed as stochastic chains or as
strings produced by a source, in the sense of information theory. The
relationship between Variable Length Markov Chains (VLMC) and probabilistic
dynamical sources is studied. We establish a probabilistic frame for context
trees and VLMC and we prove that any VLMC is a dynamical source for which we
explicitly build the mapping. On two examples, the ``comb'' and the ``bamboo
blossom'', we find a necessary and sufficient condition for the existence and
the unicity of a stationary probability measure for the VLMC. These two
examples are detailed in order to provide the associated Dirichlet series as
well as the generating functions of word occurrences.Comment: 45 pages, 15 figure
On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source
In this paper, we develop an explicit formula allowing to compute the first k
moments of the random count of a pattern in a multi-states sequence generated
by a Markov source. We derive efficient algorithms allowing to deal both with
low or high complexity patterns and either homogeneous or heterogenous Markov
models. We then apply these results to the distribution of DNA patterns in
genomic sequences where we show that moment-based developments (namely:
Edgeworth's expansion and Gram-Charlier type B series) allow to improve the
reliability of common asymptotic approximations like Gaussian or Poisson
approximations
Spike trains statistics in Integrate and Fire Models: exact results
We briefly review and highlight the consequences of rigorous and exact
results obtained in \cite{cessac:10}, characterizing the statistics of spike
trains in a network of leaky Integrate-and-Fire neurons, where time is discrete
and where neurons are subject to noise, without restriction on the synaptic
weights connectivity. The main result is that spike trains statistics are
characterized by a Gibbs distribution, whose potential is explicitly
computable. This establishes, on one hand, a rigorous ground for the current
investigations attempting to characterize real spike trains data with Gibbs
distributions, such as the Ising-like distribution, using the maximal entropy
principle. However, it transpires from the present analysis that the Ising
model might be a rather weak approximation. Indeed, the Gibbs potential (the
formal "Hamiltonian") is the log of the so-called "conditional intensity" (the
probability that a neuron fires given the past of the whole network). But, in
the present example, this probability has an infinite memory, and the
corresponding process is non-Markovian (resp. the Gibbs potential has infinite
range). Moreover, causality implies that the conditional intensity does not
depend on the state of the neurons at the \textit{same time}, ruling out the
Ising model as a candidate for an exact characterization of spike trains
statistics. However, Markovian approximations can be proposed whose degree of
approximation can be rigorously controlled. In this setting, Ising model
appears as the "next step" after the Bernoulli model (independent neurons)
since it introduces spatial pairwise correlations, but not time correlations.
The range of validity of this approximation is discussed together with possible
approaches allowing to introduce time correlations, with algorithmic
extensions.Comment: 6 pages, submitted to conference NeuroComp2010
http://2010.neurocomp.fr/; Bruno Cessac
http://www-sop.inria.fr/neuromathcomp
Sparse approaches for the exact distribution of patterns in long state sequences generated by a Markov source
We present two novel approaches for the computation of the exact distribution
of a pattern in a long sequence. Both approaches take into account the sparse
structure of the problem and are two-part algorithms. The first approach relies
on a partial recursion after a fast computation of the second largest
eigenvalue of the transition matrix of a Markov chain embedding. The second
approach uses fast Taylor expansions of an exact bivariate rational
reconstruction of the distribution. We illustrate the interest of both
approaches on a simple toy-example and two biological applications: the
transcription factors of the Human Chromosome 5 and the PROSITE signatures of
functional motifs in proteins. On these example our methods demonstrate their
complementarity and their hability to extend the domain of feasibility for
exact computations in pattern problems to a new level
The impact of temporal sampling resolution on parameter inference for biological transport models
Imaging data has become widely available to study biological systems at
various scales, for example the motile behaviour of bacteria or the transport
of mRNA, and it has the potential to transform our understanding of key
transport mechanisms. Often these imaging studies require us to compare
biological species or mutants, and to do this we need to quantitatively
characterise their behaviour. Mathematical models offer a quantitative
description of a system that enables us to perform this comparison, but to
relate these mechanistic mathematical models to imaging data, we need to
estimate the parameters of the models. In this work, we study the impact of
collecting data at different temporal resolutions on parameter inference for
biological transport models by performing exact inference for simple velocity
jump process models in a Bayesian framework. This issue is prominent in a host
of studies because the majority of imaging technologies place constraints on
the frequency with which images can be collected, and the discrete nature of
observations can introduce errors into parameter estimates. In this work, we
avoid such errors by formulating the velocity jump process model within a
hidden states framework. This allows us to obtain estimates of the
reorientation rate and noise amplitude for noisy observations of a simple
velocity jump process. We demonstrate the sensitivity of these estimates to
temporal variations in the sampling resolution and extent of measurement noise.
We use our methodology to provide experimental guidelines for researchers
aiming to characterise motile behaviour that can be described by a velocity
jump process. In particular, we consider how experimental constraints resulting
in a trade-off between temporal sampling resolution and observation noise may
affect parameter estimates.Comment: Published in PLOS Computational Biolog
Distinguishing Hidden Markov Chains
Hidden Markov Chains (HMCs) are commonly used mathematical models of
probabilistic systems. They are employed in various fields such as speech
recognition, signal processing, and biological sequence analysis. We consider
the problem of distinguishing two given HMCs based on an observation sequence
that one of the HMCs generates. More precisely, given two HMCs and an
observation sequence, a distinguishing algorithm is expected to identify the
HMC that generates the observation sequence. Two HMCs are called
distinguishable if for every there is a distinguishing
algorithm whose error probability is less than . We show that one
can decide in polynomial time whether two HMCs are distinguishable. Further, we
present and analyze two distinguishing algorithms for distinguishable HMCs. The
first algorithm makes a decision after processing a fixed number of
observations, and it exhibits two-sided error. The second algorithm processes
an unbounded number of observations, but the algorithm has only one-sided
error. The error probability, for both algorithms, decays exponentially with
the number of processed observations. We also provide an algorithm for
distinguishing multiple HMCs. Finally, we discuss an application in stochastic
runtime verification.Comment: This is the full version of a LICS'16 pape
Entropy-based parametric estimation of spike train statistics
We consider the evolution of a network of neurons, focusing on the asymptotic
behavior of spikes dynamics instead of membrane potential dynamics. The spike
response is not sought as a deterministic response in this context, but as a
conditional probability : "Reading out the code" consists of inferring such a
probability. This probability is computed from empirical raster plots, by using
the framework of thermodynamic formalism in ergodic theory. This gives us a
parametric statistical model where the probability has the form of a Gibbs
distribution. In this respect, this approach generalizes the seminal and
profound work of Schneidman and collaborators. A minimal presentation of the
formalism is reviewed here, while a general algorithmic estimation method is
proposed yielding fast convergent implementations. It is also made explicit how
several spike observables (entropy, rate, synchronizations, correlations) are
given in closed-form from the parametric estimation. This paradigm does not
only allow us to estimate the spike statistics, given a design choice, but also
to compare different models, thus answering comparative questions about the
neural code such as : "are correlations (or time synchrony or a given set of
spike patterns, ..) significant with respect to rate coding only ?" A numerical
validation of the method is proposed and the perspectives regarding spike-train
code analysis are also discussed.Comment: 37 pages, 8 figures, submitte
- …