1,293 research outputs found
Fractals from genomes: exact solutions of a biology-inspired problem
This is a review of a set of recent papers with some new data added. After a
brief biological introduction a visualization scheme of the string composition
of long DNA sequences, in particular, of bacterial complete genomes, will be
described. This scheme leads to a class of self-similar and self-overlapping
fractals in the limit of infinitely long constotuent strings. The calculation
of their exact dimensions and the counting of true and redundant avoided
strings at different string lengths turn out to be one and the same problem. We
give exact solution of the problem using two independent methods: the
Goulden-Jackson cluster method in combinatorics and the method of formal
language theory.Comment: 24 pages, LaTeX, 5 PostScript figures (two in color), psfi
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Spaced seeds have been recently shown to not only detect more alignments, but
also to give a more accurate measure of phylogenetic distances (Boden et al.,
2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower
misclassification rate when used with Support Vector Machines (SVMs) (On-odera
and Shibuya, 2013), We confirm by independent experiments these two results,
and propose in this article to use a coverage criterion (Benson and Mak, 2008,
Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both
cases in order to design better seed patterns. We show first how this coverage
criterion can be directly measured by a full automaton-based approach. We then
illustrate how this criterion performs when compared with two other criteria
frequently used, namely the single-hit and multiple-hit criteria, through
correlation coefficients with the correct classification/the true distance. At
the end, for alignment-free distances, we propose an extension by adopting the
coverage criterion, show how it performs, and indicate how it can be
efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017
On the Computational Power of DNA Annealing and Ligation
In [20] it was shown that the DNA primitives of Separate,
Merge, and Amplify were not sufficiently powerful to invert
functions defined by circuits in linear time. Dan Boneh et
al [4] show that the addition of a ligation primitive, Append, provides the missing power. The question becomes, "How powerful is ligation? Are Separate, Merge, and Amplify
necessary at all?" This paper proposes to informally explore
the power of annealing and ligation for DNA computation.
We conclude, in fact, that annealing and ligation alone are
theoretically capable of universal computation
Sequential and asynchronous processes driven by stochastic or quantum grammars and their application to genomics: a survey
We present the formalism of sequential and asynchronous processes defined in
terms of random or quantum grammars and argue that these processes have
relevance in genomics. To make the article accessible to the
non-mathematicians, we keep the mathematical exposition as elementary as
possible, focusing on some general ideas behind the formalism and stating the
implications of the known mathematical results. We close with a set of open
challenging problems.Comment: Presented at the European Congress on Mathematical and Theoretical
Biology, Dresden 18--22 July 200
An Ansatz for undecidable computation in RNA-world automata
In this Ansatz we consider theoretical constructions of RNA polymers into
automata, a form of computational structure. The basis for transitions in our
automata are plausible RNA-world enzymes that may perform ligation or cleavage.
Limited to these operations, we construct RNA automata of increasing
complexity; from the Finite Automaton (RNA-FA) to the Turing Machine equivalent
2-stack PDA (RNA-2PDA) and the universal RNA-UPDA. For each automaton we show
how the enzymatic reactions match the logical operations of the RNA automaton,
and describe how biological exploration of the corresponding evolutionary space
is facilitated by the efficient arrangement of RNA polymers into a
computational structure. A critical theme of the Ansatz is the self-reference
in RNA automata configurations which exploits the program-data duality but
results in undecidable computation. We describe how undecidable computation is
exemplified in the self-referential Liar paradox that places a boundary on a
logical system, and by construction, any RNA automata. We argue that an
expansion of the evolutionary space for RNA-2PDA automata can be interpreted as
a hierarchical resolution of the undecidable computation by a meta-system (akin
to Turing's oracle), in a continual process analogous to Turing's ordinal
logics and Post's extensible recursively generated logics. On this basis, we
put forward the hypothesis that the resolution of undecidable configurations in
RNA-world automata represents a mechanism for novelty generation in the
evolutionary space, and propose avenues for future investigation of biological
automata
- …