20,114 research outputs found

    Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences

    Full text link
    We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied genomic sequences of different sizes drawn from different sources. The measures reveal the known in vivo versus in vitro predictive discrepancies and uncover their potential to pinpoint (high) nucleosome occupancy. We explore different possible signals within and beyond the nucleosome length and find that complexity indices are informative of nucleosome occupancy. We compare against the gold standard (Kaplan model) and find similar and complementary results with the main difference that our sequence complexity approach. For example, for high occupancy, complexity-based scores outperform the Kaplan model for predicting binding representing a significant advancement in predicting the highest nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Toward reliable algorithmic self-assembly of DNA tiles: A fixed-width cellular automaton pattern

    Get PDF
    Bottom-up fabrication of nanoscale structures relies on chemical processes to direct self-assembly. The complexity, precision, and yield achievable by a one-pot reaction are limited by our ability to encode assembly instructions into the molecules themselves. Nucleic acids provide a platform for investigating these issues, as molecular structure and intramolecular interactions can encode growth rules. Here, we use DNA tiles and DNA origami to grow crystals containing a cellular automaton pattern. In a one-pot annealing reaction, 250 DNA strands first assemble into a set of 10 free tile types and a seed structure, then the free tiles grow algorithmically from the seed according to the automaton rules. In our experiments, crystals grew to ~300 nm long, containing ~300 tiles with an initial assembly error rate of ~1.4% per tile. This work provides evidence that programmable molecular self-assembly may be sufficient to create a wide range of complex objects in one-pot reactions

    Programmable Control of Nucleation for Algorithmic Self-Assembly

    Get PDF
    Algorithmic self-assembly, a generalization of crystal growth processes, has been proposed as a mechanism for autonomous DNA computation and for bottom-up fabrication of complex nanostructures. A `program' for growing a desired structure consists of a set of molecular `tiles' designed to have specific binding interactions. A key challenge to making algorithmic self-assembly practical is designing tile set programs that make assembly robust to errors that occur during initiation and growth. One method for the controlled initiation of assembly, often seen in biology, is the use of a seed or catalyst molecule that reduces an otherwise large kinetic barrier to nucleation. Here we show how to program algorithmic self-assembly similarly, such that seeded assembly proceeds quickly but there is an arbitrarily large kinetic barrier to unseeded growth. We demonstrate this technique by introducing a family of tile sets for which we rigorously prove that, under the right physical conditions, linearly increasing the size of the tile set exponentially reduces the rate of spurious nucleation. Simulations of these `zig-zag' tile sets suggest that under plausible experimental conditions, it is possible to grow large seeded crystals in just a few hours such that less than 1 percent of crystals are spuriously nucleated. Simulation results also suggest that zig-zag tile sets could be used for detection of single DNA strands. Together with prior work showing that tile sets can be made robust to errors during properly initiated growth, this work demonstrates that growth of objects via algorithmic self-assembly can proceed both efficiently and with an arbitrarily low error rate, even in a model where local growth rules are probabilistic.Comment: 37 pages, 14 figure

    An information-bearing seed for nucleating algorithmic self-assembly

    Get PDF
    Self-assembly creates natural mineral, chemical, and biological structures of great complexity. Often, the same starting materials have the potential to form an infinite variety of distinct structures; information in a seed molecule can determine which form is grown as well as where and when. These phenomena can be exploited to program the growth of complex supramolecular structures, as demonstrated by the algorithmic self-assembly of DNA tiles. However, the lack of effective seeds has limited the reliability and yield of algorithmic crystals. Here, we present a programmable DNA origami seed that can display up to 32 distinct binding sites and demonstrate the use of seeds to nucleate three types of algorithmic crystals. In the simplest case, the starting materials are a set of tiles that can form crystalline ribbons of any width; the seed directs assembly of a chosen width with >90% yield. Increased structural diversity is obtained by using tiles that copy a binary string from layer to layer; the seed specifies the initial string and triggers growth under near-optimal conditions where the bit copying error rate is 17 kb of sequence information. In sum, this work demonstrates how DNA origami seeds enable the easy, high-yield, low-error-rate growth of algorithmic crystals as a route toward programmable bottom-up fabrication

    Reducing facet nucleation during algorithmic self-assembly

    Get PDF
    Algorithmic self-assembly, a generalization of crystal growth, has been proposed as a mechanism for bottom-up fabrication of complex nanostructures and autonomous DNA computation. In principle, growth can be programmed by designing a set of molecular tiles with binding interactions that enforce assembly rules. In practice, however, errors during assembly cause undesired products, drastically reducing yields. Here we provide experimental evidence that assembly can be made more robust to errors by adding redundant tiles that "proofread" assembly. We construct DNA tile sets for two methods, uniform and snaked proofreading. While both tile sets are predicted to reduce errors during growth, the snaked proofreading tile set is also designed to reduce nucleation errors on crystal facets. Using atomic force microscopy to image growth of proofreading tiles on ribbon-like crystals presenting long facets, we show that under the physical conditions we studied the rate of facet nucleation is 4-fold smaller for snaked proofreading tile sets than for uniform proofreading tile sets
    • …
    corecore