35,659 research outputs found
Global Alignment of Molecular Sequences via Ancestral State Reconstruction
Molecular phylogenetic techniques do not generally account for such common
evolutionary events as site insertions and deletions (known as indels). Instead
tree building algorithms and ancestral state inference procedures typically
rely on substitution-only models of sequence evolution. In practice these
methods are extended beyond this simplified setting with the use of heuristics
that produce global alignments of the input sequences--an important problem
which has no rigorous model-based solution. In this paper we consider a new
version of the multiple sequence alignment in the context of stochastic indel
models. More precisely, we introduce the following {\em trace reconstruction
problem on a tree} (TRPT): a binary sequence is broadcast through a tree
channel where we allow substitutions, deletions, and insertions; we seek to
reconstruct the original sequence from the sequences received at the leaves of
the tree. We give a recursive procedure for this problem with strong
reconstruction guarantees at low mutation rates, providing also an alignment of
the sequences at the leaves of the tree. The TRPT problem without indels has
been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a
bootstrapping step towards obtaining optimal phylogenetic reconstruction
methods. The present work sets up a framework for extending these works to
evolutionary models with indels
Randomness on computable probability spaces - A dynamical point of view
We extend the notion of randomness (in the version introduced by Schnorr) to computable probability spaces and compare it to a dynamical notion of randomness: typicality. Roughly, a point is typical for some dynamic, if it follows the statistical behavior of the system (Birkhoff’s pointwise ergodic theorem). We prove that a point is Schnorr random if and only if it is typical for every mixing computable dynamics. To prove the result we develop some tools for the theory of computable probability spaces (for example, morphisms) that are expected to have other applications
Schwinger-Dyson equations in large-N quantum field theories and nonlinear random processes
We propose a stochastic method for solving Schwinger-Dyson equations in
large-N quantum field theories. Expectation values of single-trace operators
are sampled by stationary probability distributions of the so-called nonlinear
random processes. The set of all histories of such processes corresponds to the
set of all planar diagrams in the perturbative expansions of the expectation
values of singlet operators. We illustrate the method on the examples of the
matrix-valued scalar field theory and the Weingarten model of random planar
surfaces on the lattice. For theories with compact field variables, such as
sigma-models or non-Abelian lattice gauge theories, the method does not
converge in the physically most interesting weak-coupling limit. In this case
one can absorb the divergences into a self-consistent redefinition of expansion
parameters. Stochastic solution of the self-consistency conditions can be
implemented as a "memory" of the random process, so that some parameters of the
process are estimated from its previous history. We illustrate this idea on the
example of two-dimensional O(N) sigma-model. Extension to non-Abelian lattice
gauge theories is discussed.Comment: 16 pages RevTeX, 14 figures; v2: Algorithm for the Weingarten model
corrected; v3: published versio
Navigating in a sea of repeats in RNA-seq without drowning
The main challenge in de novo assembly of NGS data is certainly to deal with
repeats that are longer than the reads. This is particularly true for RNA- seq
data, since coverage information cannot be used to flag repeated sequences, of
which transposable elements are one of the main examples. Most transcriptome
assemblers are based on de Bruijn graphs and have no clear and explicit model
for repeats in RNA-seq data, relying instead on heuristics to deal with them.
The results of this work are twofold. First, we introduce a formal model for
repre- senting high copy number repeats in RNA-seq data and exploit its
properties for inferring a combinatorial characteristic of repeat-associated
subgraphs. We show that the problem of identifying in a de Bruijn graph a
subgraph with this charac- teristic is NP-complete. In a second step, we show
that in the specific case of a local assembly of alternative splicing (AS)
events, we can implicitly avoid such subgraphs. In particular, we designed and
implemented an algorithm to efficiently identify AS events that are not
included in repeated regions. Finally, we validate our results using synthetic
data. We also give an indication of the usefulness of our method on real data
- …