162 research outputs found

    Approximating the distributions of runs and patterns

    Get PDF

    Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics

    Get PDF
    The technique of Finite Markov Chain Imbedding (FMCI) is a classical approach to complex combinatorial problems related to sequences. In order to get efficient algorithms, it is known that such approaches need to be first rewritten using recursive relations. We propose here to give here a general recursive algorithms allowing to compute in a numerically stable manner exact Cumulative Distribution Function (CDF) or complementary CDF (CCDF). These algorithms are then applied in two particular cases: the local score of one sequence and pattern statistics. In both cases, asymptotic developments are derived. For the local score, our new approach allows for the very first time to compute exact p-values for a practical study (finding hydrophobic segments in a protein database) where only approximations were available before. In this study, the asymptotic approximations appear to be completely unreliable for 99.5% of the considered sequences. Concerning the pattern statistics, the new FMCI algorithms dramatically outperform the previous ones as they are more reliable, easier to implement, faster and with lower memory requirements

    On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source

    Get PDF
    In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations

    Approximating the distributions of runs and patterns

    Full text link

    Open markov type population models: From discrete to continuous time

    Get PDF
    Funding Information: Funding: For the second author, this work was done under partial financial support of RFBR (Grant n. 19-01-00451). For the first and third author this work was partially supported through the project of the Centro de Matemática e Aplicações, UID/MAT/00297/2020 financed by the Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology). The APC was funded by the insurance company Fidelidade.We address the problem of finding a natural continuous time Markov type process—in open populations—that best captures the information provided by an open Markov chain in discrete time which is usually the sole possible observation from data. Given the open discrete time Markov chain, we single out two main approaches: In the first one, we consider a calibration procedure of a continuous time Markov process using a transition matrix of a discrete time Markov chain and we show that, when the discrete time transition matrix is embeddable in a continuous time one, the calibration problem has optimal solutions. In the second approach, we consider semi-Markov processes—and open Markov schemes—and we propose a direct extension from the discrete time theory to the continuous time one by using a known structure representation result for semi-Markov processes that decomposes the process as a sum of terms given by the products of the random variables of a discrete time Markov chain by time functions built from an adequate increasing sequence of stopping times.publishersversionpublishe

    Stationary and regenerative multivariate point processes

    Get PDF
    Imperial Users onl

    Sparse approaches for the exact distribution of patterns in long state sequences generated by a Markov source

    Get PDF
    We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem and are two-part algorithms. The first approach relies on a partial recursion after a fast computation of the second largest eigenvalue of the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy-example and two biological applications: the transcription factors of the Human Chromosome 5 and the PROSITE signatures of functional motifs in proteins. On these example our methods demonstrate their complementarity and their hability to extend the domain of feasibility for exact computations in pattern problems to a new level

    Three-state Markov chain based reliability analysis of complex traction power supply systems

    Get PDF

    Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models.</p> <p>Results</p> <p>The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence.</p> <p>Conclusions</p> <p>Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.</p

    Markov and Semi-markov Chains, Processes, Systems and Emerging Related Fields

    Get PDF
    This book covers a broad range of research results in the field of Markov and Semi-Markov chains, processes, systems and related emerging fields. The authors of the included research papers are well-known researchers in their field. The book presents the state-of-the-art and ideas for further research for theorists in the fields. Nonetheless, it also provides straightforwardly applicable results for diverse areas of practitioners
    • …
    corecore