16 research outputs found

    Demystifying our Grandparent's De Bruijn Sequences with Concatenation Trees

    Full text link
    Some of the most interesting de Bruijn sequences can be constructed in seemingly unrelated ways. In particular, the "Granddaddy" and "Grandmama" can be understood by joining necklace cycles into a tree using simple parent rules, or by concatenating smaller strings (e.g., Lyndon words) in lexicographic orders. These constructions are elegant, but their equivalences seem to come out of thin air, and the community has had limited success in finding others of the same ilk. We aim to demystify the connection between cycle-joining trees and concatenation schemes by introducing "concatenation trees". These structures combine binary trees and ordered trees, and traversals yield concatenation schemes for their sequences. In this work, we focus on the four simplest cycle-joining trees using the pure cycling register (PCR): "Granddaddy" (PCR1), "Grandmama" (PCR2), "Granny" (PCR3), and "Grandpa" (PCR4). In particular, we formally prove a previously observed correspondence for PCR3 and we unravel the mystery behind PCR4. More broadly, this work lays the foundation for translating cycle-joining trees to known concatenation constructions for a variety of underlying feedback functions including the complementing cycling register (CCR), pure summing register (PSR), complementing summing register (CSR), and pure run-length register (PRR)

    MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation

    Get PDF
    Metagenomic profiling is challenging in part because of the highly uneven sampling of the tree of life by genome sequencing projects and the limitations imposed by performing phylogenetic inference at fixed taxonomic ranks. We present the algorithm MetaPalette, which uses long k-mer sizes (k = 30, 50) to fit a k-mer “palette” of a given sample to the k-mer palette of reference organisms. By modeling the k-mer palettes of unknown organisms, the method also gives an indication of the presence, abundance, and evolutionary relatedness of novel organisms present in the sample. The method returns a traditional, fixed-rank taxonomic profile which is shown on independently simulated data to be one of the most accurate to date. Tree figures are also returned that quantify the relatedness of novel organisms to reference sequences, and the accuracy of such figures is demonstrated on simulated spike-ins and a metagenomic soil sample. The software implementing MetaPalette is available at: https://github.com/dkoslicki/MetaPalette. Pretrained databases are included for Archaea, Bacteria, Eukaryota, and viruses

    Computation and programmability at the nano-bio interface

    Get PDF
    PhD ThesisThe manipulation of physical reality on the molecular level and construction of devices operating on the nanoscale has been the focal point of nanotechnology. In particular, nanotechnology based on DNA and RNA has a potential to nd applications in the eld of Synthetic Biology thanks to the inherent compatibility of nucleic acids with biological systems. Sca olded DNA origami, proposed by P. Rothemund, is one of the leading and most successful methods in which nanostructures are realised through rational programming of short 'staple' oligomers which fold a long single-stranded DNA called the 'sca old' strand into a variety of desired shapes. DNA origami already has many applications; including intelligent drug delivery, miniaturisation of logic circuits and computation in vivo. However, one of the factors that are limiting the complexity, applicability and scalability of this approach is the source of the sca old which commonly originates from viruses or phages. Furthermore, developing a robust and orthogonal interface between DNA nanotechnology and biological parts remains a signi cant challenge. The rst part of this thesis tackles these issues by challenging the fundamental as- sumption in the eld, namely that a viral sequence is to be used as the DNA origami sca old. A method is introduced for de novo generation of long synthetic sequences based on De Bruijn sequence, which has been previously proposed in combinatorics. The thesis presents a collection of algorithms which allow the construction of custom- made sequences that are uniquely addressable and biologically orthogonal (i.e. they do not code for any known biological function). Synthetic sca olds generated by these algorithms are computationally analysed and compared with their natural counter- parts with respect to: repetition in sequence, secondary structure and thermodynamic addressability. This also aids the design of wet lab experiments pursuing justi cation and veri cation of this novel approach by empirical evidence. The second part of this thesis discusses the possibility of applying evolutionary op- timisation to synthetic DNA sequences under constraints dictated by the biological interface. A multi-strand system is introduced based on an alternative approach to DNA self-assembly, which relies on strand-displacement cascades, for molecular data storage. The thesis demonstrates how a genetic algorithm can be used to generate viable solutions to this sequence optimisation problem which favours the target self- assembly con guration. Additionally, the kinetics of strand-displacement reactions are analysed with existing coarse-grained DNA models (oxDNA). This thesis is motivated by the application of scienti c computing to problems which lie on the boundary of Computer Science and the elds of DNA Nanotechnology, DNA Computing and Synthetic Biology, and thus I endeavour to the best of my ability to establish this work within the context of these disciplines

    Optimizations and Hardware Implementations for Composited de Bruijn Sequence Generators

    Get PDF
    A binary de Bruijn sequence with period 2^n is a sequence in which every length-n sub-sequence occurs exactly once. de Bruijn sequences have randomness properties that make them attractive for pseudorandom number generators. Unfortunately, it is very difficult to find de Bruijn sequence generators with large periods (e.g., 2^{64}) and most known de Bruijn sequence construction techniques are computationally quite expensive. In this thesis we present a set of optimizations that reduces the computational complexity of the de Bruijn sequence generators constructed by the composited construction technique, which is the most effective one we know. We call optimized composited de Bruijn sequence generators "OcDeb". An original (k, n)-composited de Bruijn sequence generator generates a sequence with period 2^{n+k} and uses O(k^2 + nk) bit operations. Our optimizations reduce this to O(klog (k) + log (n)) operations, allow retiming, and enable parallel implementations that produce multiple bits per clock cycle while reusing some combinational hardware. Our optimizations are formulated in lemmas and theorems with proofs. The benefits of OcDeb-k-n over (k, n)-composited de Bruijn sequence generators are demonstrate by comprehensive results in a 65nm CMOS ASIC library. For example, before place-and-route, an instance of OcDeb-32-32 has a period of 2^{64}, an area of 656 GE and a maximum performance of 1.67 Gbps, representing 1.7X and 29.4X improvement on area and performance respectively over the previous implementation method presented by Mandal and Gong; with parallelization, this instance can achieve 8.30 Gbps with an area of 1229 GE. An instance of OcDeb-512-32 has a period of 2^{544}, an area of 7949 GE, and a maximum performance of 1.43 Gbps

    Foundations of Software Science and Computation Structures

    Get PDF
    This open access book constitutes the proceedings of the 25th International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2022, which was held during April 4-6, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 23 regular papers presented in this volume were carefully reviewed and selected from 77 submissions. They deal with research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems
    corecore