6 research outputs found

    Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences

    Full text link
    We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure

    An Ansatz for undecidable computation in RNA-world automata

    Full text link
    In this Ansatz we consider theoretical constructions of RNA polymers into automata, a form of computational structure. The basis for transitions in our automata are plausible RNA-world enzymes that may perform ligation or cleavage. Limited to these operations, we construct RNA automata of increasing complexity; from the Finite Automaton (RNA-FA) to the Turing Machine equivalent 2-stack PDA (RNA-2PDA) and the universal RNA-UPDA. For each automaton we show how the enzymatic reactions match the logical operations of the RNA automaton, and describe how biological exploration of the corresponding evolutionary space is facilitated by the efficient arrangement of RNA polymers into a computational structure. A critical theme of the Ansatz is the self-reference in RNA automata configurations which exploits the program-data duality but results in undecidable computation. We describe how undecidable computation is exemplified in the self-referential Liar paradox that places a boundary on a logical system, and by construction, any RNA automata. We argue that an expansion of the evolutionary space for RNA-2PDA automata can be interpreted as a hierarchical resolution of the undecidable computation by a meta-system (akin to Turing's oracle), in a continual process analogous to Turing's ordinal logics and Post's extensible recursively generated logics. On this basis, we put forward the hypothesis that the resolution of undecidable configurations in RNA-world automata represents a mechanism for novelty generation in the evolutionary space, and propose avenues for future investigation of biological automata
    corecore