16 research outputs found
Demystifying our Grandparent's De Bruijn Sequences with Concatenation Trees
Some of the most interesting de Bruijn sequences can be constructed in
seemingly unrelated ways. In particular, the "Granddaddy" and "Grandmama" can
be understood by joining necklace cycles into a tree using simple parent rules,
or by concatenating smaller strings (e.g., Lyndon words) in lexicographic
orders. These constructions are elegant, but their equivalences seem to come
out of thin air, and the community has had limited success in finding others of
the same ilk. We aim to demystify the connection between cycle-joining trees
and concatenation schemes by introducing "concatenation trees". These
structures combine binary trees and ordered trees, and traversals yield
concatenation schemes for their sequences.
In this work, we focus on the four simplest cycle-joining trees using the
pure cycling register (PCR): "Granddaddy" (PCR1), "Grandmama" (PCR2), "Granny"
(PCR3), and "Grandpa" (PCR4). In particular, we formally prove a previously
observed correspondence for PCR3 and we unravel the mystery behind PCR4. More
broadly, this work lays the foundation for translating cycle-joining trees to
known concatenation constructions for a variety of underlying feedback
functions including the complementing cycling register (CCR), pure summing
register (PSR), complementing summing register (CSR), and pure run-length
register (PRR)
MetaPalette: a k-mer Painting Approach for Metagenomic Taxonomic Profiling and Quantification of Novel Strain Variation
Metagenomic profiling is challenging in part because of the highly uneven sampling of the tree of life by genome sequencing projects and the limitations imposed by performing phylogenetic inference at fixed taxonomic ranks. We present the algorithm MetaPalette, which uses long k-mer sizes (k = 30, 50) to fit a k-mer “palette” of a given sample to the k-mer palette of reference organisms. By modeling the k-mer palettes of unknown organisms, the method also gives an indication of the presence, abundance, and evolutionary relatedness of novel organisms present in the sample. The method returns a traditional, fixed-rank taxonomic profile which is shown on independently simulated data to be one of the most accurate to date. Tree figures are also returned that quantify the relatedness of novel organisms to reference sequences, and the accuracy of such figures is demonstrated on simulated spike-ins and a metagenomic soil sample. The software implementing MetaPalette is available at: https://github.com/dkoslicki/MetaPalette. Pretrained databases are included for Archaea, Bacteria, Eukaryota, and viruses
Computation and programmability at the nano-bio interface
PhD ThesisThe manipulation of physical reality on the molecular level and construction of devices
operating on the nanoscale has been the focal point of nanotechnology. In particular,
nanotechnology based on DNA and RNA has a potential to nd applications in the
eld of Synthetic Biology thanks to the inherent compatibility of nucleic acids with
biological systems. Sca olded DNA origami, proposed by P. Rothemund, is one of
the leading and most successful methods in which nanostructures are realised through
rational programming of short 'staple' oligomers which fold a long single-stranded
DNA called the 'sca old' strand into a variety of desired shapes. DNA origami already
has many applications; including intelligent drug delivery, miniaturisation of logic
circuits and computation in vivo. However, one of the factors that are limiting the
complexity, applicability and scalability of this approach is the source of the sca old
which commonly originates from viruses or phages. Furthermore, developing a robust
and orthogonal interface between DNA nanotechnology and biological parts remains
a signi cant challenge.
The rst part of this thesis tackles these issues by challenging the fundamental as-
sumption in the eld, namely that a viral sequence is to be used as the DNA origami
sca old. A method is introduced for de novo generation of long synthetic sequences
based on De Bruijn sequence, which has been previously proposed in combinatorics.
The thesis presents a collection of algorithms which allow the construction of custom-
made sequences that are uniquely addressable and biologically orthogonal (i.e. they
do not code for any known biological function). Synthetic sca olds generated by these
algorithms are computationally analysed and compared with their natural counter-
parts with respect to: repetition in sequence, secondary structure and thermodynamic
addressability. This also aids the design of wet lab experiments pursuing justi cation
and veri cation of this novel approach by empirical evidence.
The second part of this thesis discusses the possibility of applying evolutionary op-
timisation to synthetic DNA sequences under constraints dictated by the biological
interface. A multi-strand system is introduced based on an alternative approach to
DNA self-assembly, which relies on strand-displacement cascades, for molecular data
storage. The thesis demonstrates how a genetic algorithm can be used to generate
viable solutions to this sequence optimisation problem which favours the target self-
assembly con guration. Additionally, the kinetics of strand-displacement reactions
are analysed with existing coarse-grained DNA models (oxDNA).
This thesis is motivated by the application of scienti c computing to problems which
lie on the boundary of Computer Science and the elds of DNA Nanotechnology, DNA
Computing and Synthetic Biology, and thus I endeavour to the best of my ability to
establish this work within the context of these disciplines
Optimizations and Hardware Implementations for Composited de Bruijn Sequence Generators
A binary de Bruijn sequence with period 2^n is a sequence in which every length-n sub-sequence occurs exactly once. de Bruijn sequences have randomness properties that make them attractive for pseudorandom number generators. Unfortunately, it is very difficult to find de Bruijn sequence generators with large periods (e.g., 2^{64}) and most known de Bruijn sequence construction techniques are computationally quite expensive. In this thesis we present a set of optimizations that reduces the computational complexity of the de Bruijn sequence generators constructed by the composited construction technique, which is the most effective one we know. We call optimized composited de Bruijn sequence generators "OcDeb". An original (k, n)-composited de Bruijn sequence generator generates a sequence with period 2^{n+k} and uses O(k^2 + nk) bit operations. Our optimizations reduce this to O(klog (k) + log (n)) operations, allow retiming, and enable parallel implementations that produce multiple bits per clock cycle while reusing some combinational hardware. Our optimizations are formulated in lemmas and theorems with proofs. The benefits of OcDeb-k-n over (k, n)-composited de Bruijn sequence generators are demonstrate by comprehensive results in a 65nm CMOS ASIC library. For example, before place-and-route, an instance of OcDeb-32-32 has a period of 2^{64}, an area of 656 GE and a maximum performance of 1.67 Gbps, representing 1.7X and 29.4X improvement on area and performance respectively over the previous implementation method presented by Mandal and Gong; with parallelization, this instance can achieve 8.30 Gbps with an area of 1229 GE. An instance of OcDeb-512-32 has a period of 2^{544}, an area of 7949 GE, and a maximum performance of 1.43 Gbps
Foundations of Software Science and Computation Structures
This open access book constitutes the proceedings of the 25th International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2022, which was held during April 4-6, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 23 regular papers presented in this volume were carefully reviewed and selected from 77 submissions. They deal with research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems