1,293 research outputs found

    Fractals from genomes: exact solutions of a biology-inspired problem

    Full text link
    This is a review of a set of recent papers with some new data added. After a brief biological introduction a visualization scheme of the string composition of long DNA sequences, in particular, of bacterial complete genomes, will be described. This scheme leads to a class of self-similar and self-overlapping fractals in the limit of infinitely long constotuent strings. The calculation of their exact dimensions and the counting of true and redundant avoided strings at different string lengths turn out to be one and the same problem. We give exact solution of the problem using two independent methods: the Goulden-Jackson cluster method in combinatorics and the method of formal language theory.Comment: 24 pages, LaTeX, 5 PostScript figures (two in color), psfi

    A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

    Get PDF
    Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

    A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances

    Get PDF
    Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.Comment: http://online.liebertpub.com/doi/abs/10.1089/cmb.2014.017

    On the Computational Power of DNA Annealing and Ligation

    Get PDF
    In [20] it was shown that the DNA primitives of Separate, Merge, and Amplify were not sufficiently powerful to invert functions defined by circuits in linear time. Dan Boneh et al [4] show that the addition of a ligation primitive, Append, provides the missing power. The question becomes, "How powerful is ligation? Are Separate, Merge, and Amplify necessary at all?" This paper proposes to informally explore the power of annealing and ligation for DNA computation. We conclude, in fact, that annealing and ligation alone are theoretically capable of universal computation

    Sequential and asynchronous processes driven by stochastic or quantum grammars and their application to genomics: a survey

    Full text link
    We present the formalism of sequential and asynchronous processes defined in terms of random or quantum grammars and argue that these processes have relevance in genomics. To make the article accessible to the non-mathematicians, we keep the mathematical exposition as elementary as possible, focusing on some general ideas behind the formalism and stating the implications of the known mathematical results. We close with a set of open challenging problems.Comment: Presented at the European Congress on Mathematical and Theoretical Biology, Dresden 18--22 July 200

    An Ansatz for undecidable computation in RNA-world automata

    Full text link
    In this Ansatz we consider theoretical constructions of RNA polymers into automata, a form of computational structure. The basis for transitions in our automata are plausible RNA-world enzymes that may perform ligation or cleavage. Limited to these operations, we construct RNA automata of increasing complexity; from the Finite Automaton (RNA-FA) to the Turing Machine equivalent 2-stack PDA (RNA-2PDA) and the universal RNA-UPDA. For each automaton we show how the enzymatic reactions match the logical operations of the RNA automaton, and describe how biological exploration of the corresponding evolutionary space is facilitated by the efficient arrangement of RNA polymers into a computational structure. A critical theme of the Ansatz is the self-reference in RNA automata configurations which exploits the program-data duality but results in undecidable computation. We describe how undecidable computation is exemplified in the self-referential Liar paradox that places a boundary on a logical system, and by construction, any RNA automata. We argue that an expansion of the evolutionary space for RNA-2PDA automata can be interpreted as a hierarchical resolution of the undecidable computation by a meta-system (akin to Turing's oracle), in a continual process analogous to Turing's ordinal logics and Post's extensible recursively generated logics. On this basis, we put forward the hypothesis that the resolution of undecidable configurations in RNA-world automata represents a mechanism for novelty generation in the evolutionary space, and propose avenues for future investigation of biological automata
    • …
    corecore