20,114 research outputs found
Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA Sequences
We introduce and study a set of training-free methods of
information-theoretic and algorithmic complexity nature applied to DNA
sequences to identify their potential capabilities to determine nucleosomal
binding sites. We test our measures on well-studied genomic sequences of
different sizes drawn from different sources. The measures reveal the known in
vivo versus in vitro predictive discrepancies and uncover their potential to
pinpoint (high) nucleosome occupancy. We explore different possible signals
within and beyond the nucleosome length and find that complexity indices are
informative of nucleosome occupancy. We compare against the gold standard
(Kaplan model) and find similar and complementary results with the main
difference that our sequence complexity approach. For example, for high
occupancy, complexity-based scores outperform the Kaplan model for predicting
binding representing a significant advancement in predicting the highest
nucleosome occupancy following a training-free approach.Comment: 8 pages main text (4 figures), 12 total with Supplementary (1 figure
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
Toward reliable algorithmic self-assembly of DNA tiles: A fixed-width cellular automaton pattern
Bottom-up fabrication of nanoscale structures relies on chemical processes to direct self-assembly. The complexity, precision, and yield achievable by a one-pot reaction are limited by our ability to encode assembly instructions into the molecules themselves. Nucleic acids provide a platform for investigating these issues, as molecular structure and intramolecular interactions can encode growth rules. Here, we use DNA tiles and DNA origami to grow crystals containing a cellular automaton pattern. In a one-pot annealing reaction, 250 DNA strands first assemble into a set of 10 free tile types and a seed structure, then the free tiles grow algorithmically from the seed according to the automaton rules. In our experiments, crystals grew to ~300 nm long, containing ~300 tiles with an initial assembly error rate of ~1.4% per tile. This work provides evidence that programmable molecular self-assembly may be sufficient to create a wide range of complex objects in one-pot reactions
Programmable Control of Nucleation for Algorithmic Self-Assembly
Algorithmic self-assembly, a generalization of crystal growth processes, has
been proposed as a mechanism for autonomous DNA computation and for bottom-up
fabrication of complex nanostructures. A `program' for growing a desired
structure consists of a set of molecular `tiles' designed to have specific
binding interactions. A key challenge to making algorithmic self-assembly
practical is designing tile set programs that make assembly robust to errors
that occur during initiation and growth. One method for the controlled
initiation of assembly, often seen in biology, is the use of a seed or catalyst
molecule that reduces an otherwise large kinetic barrier to nucleation. Here we
show how to program algorithmic self-assembly similarly, such that seeded
assembly proceeds quickly but there is an arbitrarily large kinetic barrier to
unseeded growth. We demonstrate this technique by introducing a family of tile
sets for which we rigorously prove that, under the right physical conditions,
linearly increasing the size of the tile set exponentially reduces the rate of
spurious nucleation. Simulations of these `zig-zag' tile sets suggest that
under plausible experimental conditions, it is possible to grow large seeded
crystals in just a few hours such that less than 1 percent of crystals are
spuriously nucleated. Simulation results also suggest that zig-zag tile sets
could be used for detection of single DNA strands. Together with prior work
showing that tile sets can be made robust to errors during properly initiated
growth, this work demonstrates that growth of objects via algorithmic
self-assembly can proceed both efficiently and with an arbitrarily low error
rate, even in a model where local growth rules are probabilistic.Comment: 37 pages, 14 figure
An information-bearing seed for nucleating algorithmic self-assembly
Self-assembly creates natural mineral, chemical, and biological structures of great complexity. Often, the same starting materials have the potential to form an infinite variety of distinct structures; information in a seed molecule can determine which form is grown as well as where and when. These phenomena can be exploited to program the growth of complex supramolecular structures, as demonstrated by the algorithmic self-assembly of DNA tiles. However, the lack of effective seeds has limited the reliability and yield of algorithmic crystals. Here, we present a programmable DNA origami seed that can display up to 32 distinct binding sites and demonstrate the use of seeds to nucleate three types of algorithmic crystals. In the simplest case, the starting materials are a set of tiles that can form crystalline ribbons of any width; the seed directs assembly of a chosen width with >90% yield. Increased structural diversity is obtained by using tiles that copy a binary string from layer to layer; the seed specifies the initial string and triggers growth under near-optimal conditions where the bit copying error rate is 17 kb of sequence information. In sum, this work demonstrates how DNA origami seeds enable the easy, high-yield, low-error-rate growth of algorithmic crystals as a route toward programmable bottom-up fabrication
Reducing facet nucleation during algorithmic self-assembly
Algorithmic self-assembly, a generalization of crystal growth, has been proposed as a mechanism for bottom-up fabrication of complex
nanostructures and autonomous DNA computation. In principle, growth can be programmed by designing a set of molecular tiles with binding
interactions that enforce assembly rules. In practice, however, errors during assembly cause undesired products, drastically reducing yields.
Here we provide experimental evidence that assembly can be made more robust to errors by adding redundant tiles that "proofread" assembly.
We construct DNA tile sets for two methods, uniform and snaked proofreading. While both tile sets are predicted to reduce errors during
growth, the snaked proofreading tile set is also designed to reduce nucleation errors on crystal facets. Using atomic force microscopy to
image growth of proofreading tiles on ribbon-like crystals presenting long facets, we show that under the physical conditions we studied the
rate of facet nucleation is 4-fold smaller for snaked proofreading tile sets than for uniform proofreading tile sets
- …