35,659 research outputs found

    Global Alignment of Molecular Sequences via Ancestral State Reconstruction

    Get PDF
    Molecular phylogenetic techniques do not generally account for such common evolutionary events as site insertions and deletions (known as indels). Instead tree building algorithms and ancestral state inference procedures typically rely on substitution-only models of sequence evolution. In practice these methods are extended beyond this simplified setting with the use of heuristics that produce global alignments of the input sequences--an important problem which has no rigorous model-based solution. In this paper we consider a new version of the multiple sequence alignment in the context of stochastic indel models. More precisely, we introduce the following {\em trace reconstruction problem on a tree} (TRPT): a binary sequence is broadcast through a tree channel where we allow substitutions, deletions, and insertions; we seek to reconstruct the original sequence from the sequences received at the leaves of the tree. We give a recursive procedure for this problem with strong reconstruction guarantees at low mutation rates, providing also an alignment of the sequences at the leaves of the tree. The TRPT problem without indels has been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a bootstrapping step towards obtaining optimal phylogenetic reconstruction methods. The present work sets up a framework for extending these works to evolutionary models with indels

    Randomness on computable probability spaces - A dynamical point of view

    Full text link
    We extend the notion of randomness (in the version introduced by Schnorr) to computable probability spaces and compare it to a dynamical notion of randomness: typicality. Roughly, a point is typical for some dynamic, if it follows the statistical behavior of the system (Birkhoff’s pointwise ergodic theorem). We prove that a point is Schnorr random if and only if it is typical for every mixing computable dynamics. To prove the result we develop some tools for the theory of computable probability spaces (for example, morphisms) that are expected to have other applications

    Schwinger-Dyson equations in large-N quantum field theories and nonlinear random processes

    Full text link
    We propose a stochastic method for solving Schwinger-Dyson equations in large-N quantum field theories. Expectation values of single-trace operators are sampled by stationary probability distributions of the so-called nonlinear random processes. The set of all histories of such processes corresponds to the set of all planar diagrams in the perturbative expansions of the expectation values of singlet operators. We illustrate the method on the examples of the matrix-valued scalar field theory and the Weingarten model of random planar surfaces on the lattice. For theories with compact field variables, such as sigma-models or non-Abelian lattice gauge theories, the method does not converge in the physically most interesting weak-coupling limit. In this case one can absorb the divergences into a self-consistent redefinition of expansion parameters. Stochastic solution of the self-consistency conditions can be implemented as a "memory" of the random process, so that some parameters of the process are estimated from its previous history. We illustrate this idea on the example of two-dimensional O(N) sigma-model. Extension to non-Abelian lattice gauge theories is discussed.Comment: 16 pages RevTeX, 14 figures; v2: Algorithm for the Weingarten model corrected; v3: published versio

    Navigating in a sea of repeats in RNA-seq without drowning

    Full text link
    The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which transposable elements are one of the main examples. Most transcriptome assemblers are based on de Bruijn graphs and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are twofold. First, we introduce a formal model for repre- senting high copy number repeats in RNA-seq data and exploit its properties for inferring a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying in a de Bruijn graph a subgraph with this charac- teristic is NP-complete. In a second step, we show that in the specific case of a local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs. In particular, we designed and implemented an algorithm to efficiently identify AS events that are not included in repeated regions. Finally, we validate our results using synthetic data. We also give an indication of the usefulness of our method on real data
    corecore