1,623 research outputs found
Non-asymptotic Upper Bounds for Deletion Correcting Codes
Explicit non-asymptotic upper bounds on the sizes of multiple-deletion
correcting codes are presented. In particular, the largest single-deletion
correcting code for -ary alphabet and string length is shown to be of
size at most . An improved bound on the asymptotic
rate function is obtained as a corollary. Upper bounds are also derived on
sizes of codes for a constrained source that does not necessarily comprise of
all strings of a particular length, and this idea is demonstrated by
application to sets of run-length limited strings.
The problem of finding the largest deletion correcting code is modeled as a
matching problem on a hypergraph. This problem is formulated as an integer
linear program. The upper bound is obtained by the construction of a feasible
point for the dual of the linear programming relaxation of this integer linear
program.
The non-asymptotic bounds derived imply the known asymptotic bounds of
Levenshtein and Tenengolts and improve on known non-asymptotic bounds.
Numerical results support the conjecture that in the binary case, the
Varshamov-Tenengolts codes are the largest single-deletion correcting codes.Comment: 18 pages, 4 figure
Computing phonological generalization over real speech exemplars
Though it has attracted growing attention from phonologists and phoneticians Exemplar Theory (e g Bybee 2001) has hitherto lacked an explicit production model that can apply to speech signals An adequate model must be able to generalize but this presents the problem of how to generate an output that generalizes over a collection of unique variable-length signals Rather than resorting to a priori phonological units such as phones we adopt a dynamic programming approach using an optimization criterion that is sensitive to the frequency of similar subsequences within other exemplars the Phonological Exemplar-Based Learning System We show that PEBLS displays pattern-entrenchment behaviour central to Exemplar Theory s account of phonologization (C) 2010 Elsevier Ltd All rights reserve
Global transposable characteristics in the yeast complete DNA sequence
Global transposable characteristics in the complete DNA sequence of the
Saccharomyces cevevisiae yeast is determined by using the metric representation
and recurrence plot methods. In the form of the correlation distance of
nucleotide strings, 16 chromosome sequences of the yeast, which are divided
into 5 groups, display 4 kinds of the fundamental transposable characteristics:
a short period increasing, a long quasi-period increasing, a long major value
and hardly relevant.Comment: 19 pages, 5 figures, 5 table
Computational Performance Evaluation of Two Integer Linear Programming Models for the Minimum Common String Partition Problem
In the minimum common string partition (MCSP) problem two related input
strings are given. "Related" refers to the property that both strings consist
of the same set of letters appearing the same number of times in each of the
two strings. The MCSP seeks a minimum cardinality partitioning of one string
into non-overlapping substrings that is also a valid partitioning for the
second string. This problem has applications in bioinformatics e.g. in
analyzing related DNA or protein sequences. For strings with lengths less than
about 1000 letters, a previously published integer linear programming (ILP)
formulation yields, when solved with a state-of-the-art solver such as CPLEX,
satisfactory results. In this work, we propose a new, alternative ILP model
that is compared to the former one. While a polyhedral study shows the linear
programming relaxations of the two models to be equally strong, a comprehensive
experimental comparison using real-world as well as artificially created
benchmark instances indicates substantial computational advantages of the new
formulation.Comment: arXiv admin note: text overlap with arXiv:1405.5646 This paper
version replaces the one submitted on January 10, 2015, due to detected error
in the calculation of the variables involved in the ILP model
On a Speculated Relation Between Chv\'atal-Sankoff Constants of Several Sequences
It is well known that, when normalized by n, the expected length of a longest
common subsequence of d sequences of length n over an alphabet of size sigma
converges to a constant gamma_{sigma,d}. We disprove a speculation by Steele
regarding a possible relation between gamma_{2,d} and gamma_{2,2}. In order to
do that we also obtain new lower bounds for gamma_{sigma,d}, when both sigma
and d are small integers.Comment: 13 pages. To appear in Combinatorics, Probability and Computin
- …