11,783 research outputs found
Linear constructions for DNA codes
AbstractIn this paper we translate in terms of coding theory constraints that are used in designing DNA codes for use in DNA computing or as bar-codes in chemical libraries. We propose new constructions for DNA codes satisfying either a reverse-complement constraint, a GC-content constraint, or both, that are derived from additive and linear codes over four-letter alphabets. We focus in particular on codes over GF(4), and we construct new DNA codes that are in many cases better (sometimes far better) than previously known codes. We provide updated tables up to length 20 that include these codes as well as new codes constructed using a combination of lexicographic techniques and stochastic search
Linear and nonlinear constructions of DNA codes with Hamming distance d, constant GC-content and a reverse-complement constraint
AbstractIn a previous paper, the authors used cyclic and extended cyclic constructions to obtain codes over an alphabet {A,C,G,T} satisfying a Hamming distance constraint and a GC-content constraint. These codes are applicable to the design of synthetic DNA strands used in DNA microarrays, as DNA tags in chemical libraries and in DNA computing. The GC-content constraint specifies that a fixed number of positions are G or C in each codeword, which ensures uniform melting temperatures. The Hamming distance constraint is a step towards avoiding unwanted hybridizations. This approach extended the pioneering work of Gaborit and King. In the current paper, another constraint known as a reverse-complement constraint is added to further prevent unwanted hybridizations.Many new best codes are obtained, and are reproducible from the information presented here. The reverse-complement constraint is handled by searching for an involution with 0 or 1 fixed points, as first done by Gaborit and King. Linear codes and additive codes over GF(4) and their cosets are considered, as well as shortenings of these codes. In the additive case, codes obtained from two different mappings from GF(4) to {A,C,G,T} are considered
Mutually Uncorrelated Primers for DNA-Based Data Storage
We introduce the notion of weakly mutually uncorrelated (WMU) sequences,
motivated by applications in DNA-based data storage systems and for
synchronization of communication devices. WMU sequences are characterized by
the property that no sufficiently long suffix of one sequence is the prefix of
the same or another sequence. WMU sequences used for primer design in DNA-based
data storage systems are also required to be at large mutual Hamming distance
from each other, have balanced compositions of symbols, and avoid primer-dimer
byproducts. We derive bounds on the size of WMU and various constrained WMU
codes and present a number of constructions for balanced, error-correcting,
primer-dimer free WMU codes using Dyck paths, prefix-synchronized and cyclic
codes.Comment: 14 pages, 3 figures, 1 Table. arXiv admin note: text overlap with
arXiv:1601.0817
Asymmetric Lee Distance Codes for DNA-Based Storage
We consider a new family of codes, termed asymmetric Lee distance codes, that
arise in the design and implementation of DNA-based storage systems and systems
with parallel string transmission protocols. The codewords are defined over a
quaternary alphabet, although the results carry over to other alphabet sizes;
furthermore, symbol confusability is dictated by their underlying binary
representation. Our contributions are two-fold. First, we demonstrate that the
new distance represents a linear combination of the Lee and Hamming distance
and derive upper bounds on the size of the codes under this metric based on
linear programming techniques. Second, we propose a number of code
constructions which imply lower bounds
Bounds for DNA codes with constant GC-content
We derive theoretical upper and lower bounds on the maximum size of DNA codes
of length n with constant GC-content w and minimum Hamming distance d, both
with and without the additional constraint that the minimum Hamming distance
between any codeword and the reverse-complement of any codeword be at least d.
We also explicitly construct codes that are larger than the best
previously-published codes for many choices of the parameters n, d and w.Comment: 13 pages, no figures; a few references added and typos correcte
Improved Lower Bounds for Constant GC-Content DNA Codes
The design of large libraries of oligonucleotides having constant GC-content
and satisfying Hamming distance constraints between oligonucleotides and their
Watson-Crick complements is important in reducing hybridization errors in DNA
computing, DNA microarray technologies, and molecular bar coding. Various
techniques have been studied for the construction of such oligonucleotide
libraries, ranging from algorithmic constructions via stochastic local search
to theoretical constructions via coding theory. We introduce a new stochastic
local search method which yields improvements up to more than one third of the
benchmark lower bounds of Gaborit and King (2005) for n-mer oligonucleotide
libraries when n <= 14. We also found several optimal libraries by computing
maximum cliques on certain graphs.Comment: 4 page
Efficient Two-Stage Group Testing Algorithms for Genetic Screening
Efficient two-stage group testing algorithms that are particularly suited for
rapid and less-expensive DNA library screening and other large scale biological
group testing efforts are investigated in this paper. The main focus is on
novel combinatorial constructions in order to minimize the number of individual
tests at the second stage of a two-stage disjunctive testing procedure.
Building on recent work by Levenshtein (2003) and Tonchev (2008), several new
infinite classes of such combinatorial designs are presented.Comment: 14 pages; to appear in "Algorithmica". Part of this work has been
presented at the ICALP 2011 Group Testing Workshop; arXiv:1106.368
- …