11,098 research outputs found
Improved Lower Bounds for Constant GC-Content DNA Codes
The design of large libraries of oligonucleotides having constant GC-content
and satisfying Hamming distance constraints between oligonucleotides and their
Watson-Crick complements is important in reducing hybridization errors in DNA
computing, DNA microarray technologies, and molecular bar coding. Various
techniques have been studied for the construction of such oligonucleotide
libraries, ranging from algorithmic constructions via stochastic local search
to theoretical constructions via coding theory. We introduce a new stochastic
local search method which yields improvements up to more than one third of the
benchmark lower bounds of Gaborit and King (2005) for n-mer oligonucleotide
libraries when n <= 14. We also found several optimal libraries by computing
maximum cliques on certain graphs.Comment: 4 page
Deterministic Polynomial-Time Algorithms for Designing Short DNA Words
Designing short DNA words is a problem of constructing a set (i.e., code) of
n DNA strings (i.e., words) with the minimum length such that the Hamming
distance between each pair of words is at least k and the n words satisfy a set
of additional constraints. This problem has applications in, e.g., DNA
self-assembly and DNA arrays. Previous works include those that extended
results from coding theory to obtain bounds on code and word sizes for
biologically motivated constraints and those that applied heuristic local
searches, genetic algorithms, and randomized algorithms. In particular, Kao,
Sanghi, and Schweller (2009) developed polynomial-time randomized algorithms to
construct n DNA words of length within a multiplicative constant of the
smallest possible word length (e.g., 9 max{log n, k}) that satisfy various sets
of constraints with high probability. In this paper, we give deterministic
polynomial-time algorithms to construct DNA words based on derandomization
techniques. Our algorithms can construct n DNA words of shorter length (e.g.,
2.1 log n + 6.28 k) and satisfy the same sets of constraints as the words
constructed by the algorithms of Kao et al. Furthermore, we extend these new
algorithms to construct words that satisfy a larger set of constraints for
which the algorithms of Kao et al. do not work.Comment: 27 page
A practical index for approximate dictionary matching with few mismatches
Approximate dictionary matching is a classic string matching problem
(checking if a query string occurs in a collection of strings) with
applications in, e.g., spellchecking, online catalogs, geolocation, and web
searchers. We present a surprisingly simple solution called a split index,
which is based on the Dirichlet principle, for matching a keyword with few
mismatches, and experimentally show that it offers competitive space-time
tradeoffs. Our implementation in the C++ language is focused mostly on data
compaction, which is beneficial for the search speed (e.g., by being cache
friendly). We compare our solution with other algorithms and we show that it
performs better for the Hamming distance. Query times in the order of 1
microsecond were reported for one mismatch for the dictionary size of a few
megabytes on a medium-end PC. We also demonstrate that a basic compression
technique consisting in -gram substitution can significantly reduce the
index size (up to 50% of the input text size for the DNA), while still keeping
the query time relatively low
Dagstuhl Reports : Volume 1, Issue 2, February 2011
Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn
Bounds for DNA codes with constant GC-content
We derive theoretical upper and lower bounds on the maximum size of DNA codes
of length n with constant GC-content w and minimum Hamming distance d, both
with and without the additional constraint that the minimum Hamming distance
between any codeword and the reverse-complement of any codeword be at least d.
We also explicitly construct codes that are larger than the best
previously-published codes for many choices of the parameters n, d and w.Comment: 13 pages, no figures; a few references added and typos correcte
- …