1,694 research outputs found
Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings
We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding.
Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2.
We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function
Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit
International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment
Adaptive Protocols for Interactive Communication
How much adversarial noise can protocols for interactive communication
tolerate? This question was examined by Braverman and Rao (IEEE Trans. Inf.
Theory, 2014) for the case of "robust" protocols, where each party sends
messages only in fixed and predetermined rounds. We consider a new class of
non-robust protocols for Interactive Communication, which we call adaptive
protocols. Such protocols adapt structurally to the noise induced by the
channel in the sense that both the order of speaking, and the length of the
protocol may vary depending on observed noise.
We define models that capture adaptive protocols and study upper and lower
bounds on the permissible noise rate in these models. When the length of the
protocol may adaptively change according to the noise, we demonstrate a
protocol that tolerates noise rates up to . When the order of speaking may
adaptively change as well, we demonstrate a protocol that tolerates noise rates
up to . Hence, adaptivity circumvents an impossibility result of on
the fraction of tolerable noise (Braverman and Rao, 2014).Comment: Content is similar to previous version yet with an improved
presentatio
Detecting One-variable Patterns
Given a pattern such that
, where is a
variable and its reversal, and
are strings that contain no variables, we describe an
algorithm that constructs in time a compact representation of all
instances of in an input string of length over a polynomially bounded
integer alphabet, so that one can report those instances in time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201
Recompression: a simple and powerful technique for word equations
In this paper we present an application of a simple technique of local
recompression, previously developed by the author in the context of compressed
membership problems and compressed pattern matching, to word equations. The
technique is based on local modification of variables (replacing X by aX or Xa)
and iterative replacement of pairs of letters appearing in the equation by a
`fresh' letter, which can be seen as a bottom-up compression of the solution of
the given word equation, to be more specific, building an SLP (Straight-Line
Programme) for the solution of the word equation.
Using this technique we give a new, independent and self-contained proofs of
most of the known results for word equations. To be more specific, the
presented (nondeterministic) algorithm runs in O(n log n) space and in time
polynomial in log N, where N is the size of the length-minimal solution of the
word equation. The presented algorithm can be easily generalised to a generator
of all solutions of the given word equation (without increasing the space
usage). Furthermore, a further analysis of the algorithm yields a doubly
exponential upper bound on the size of the length-minimal solution. The
presented algorithm does not use exponential bound on the exponent of
periodicity. Conversely, the analysis of the algorithm yields an independent
proof of the exponential bound on exponent of periodicity.
We believe that the presented algorithm, its idea and analysis are far
simpler than all previously applied. Furthermore, thanks to it we can obtain a
unified and simple approach to most of known results for word equations.
As a small additional result we show that for O(1) variables (with arbitrary
many appearances in the equation) word equations can be solved in linear space,
i.e. they are context-sensitive.Comment: Submitted to a journal. Since previous version the proofs were
simplified, overall presentation improve
Synchronization Strings: Explicit Constructions, Local Decoding, and Applications
This paper gives new results for synchronization strings, a powerful
combinatorial object that allows to efficiently deal with insertions and
deletions in various communication settings:
We give a deterministic, linear time synchronization string
construction, improving over an time randomized construction.
Independently of this work, a deterministic time
construction was just put on arXiv by Cheng, Li, and Wu. We also give a
deterministic linear time construction of an infinite synchronization string,
which was not known to be computable before. Both constructions are highly
explicit, i.e., the symbol can be computed in time.
This paper also introduces a generalized notion we call
long-distance synchronization strings that allow for local and very fast
decoding. In particular, only time and access to logarithmically
many symbols is required to decode any index.
We give several applications for these results:
For any we provide an insdel correcting
code with rate which can correct any fraction
of insdel errors in time. This near linear computational
efficiency is surprising given that we do not even know how to compute the
(edit) distance between the decoding input and output in sub-quadratic time. We
show that such codes can not only efficiently recover from fraction of
insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any
fraction of block transpositions and replications.
We show that highly explicitness and local decoding allow for
infinite channel simulations with exponentially smaller memory and decoding
time requirements. These simulations can be used to give the first near linear
time interactive coding scheme for insdel errors
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
A linearly computable measure of string complexity
AbstractWe present a measure of string complexity, called I-complexity, computable in linear time and space. It counts the number of different substrings in a given string. The least complex strings are the runs of a single symbol, the most complex are the de Bruijn strings. Although the I-complexity of a string is not the length of any minimal description of the string, it satisfies many basic properties of classical description complexity. In particular, the number of strings with I-complexity up to a given value is bounded, and most strings of each length have high I-complexity
Shannon Information and Kolmogorov Complexity
We compare the elementary theories of Shannon information and Kolmogorov
complexity, the extent to which they have a common purpose, and where they are
fundamentally different. We discuss and relate the basic notions of both
theories: Shannon entropy versus Kolmogorov complexity, the relation of both to
universal coding, Shannon mutual information versus Kolmogorov (`algorithmic')
mutual information, probabilistic sufficient statistic versus algorithmic
sufficient statistic (related to lossy compression in the Shannon theory versus
meaningful information in the Kolmogorov theory), and rate distortion theory
versus Kolmogorov's structure function. Part of the material has appeared in
print before, scattered through various publications, but this is the first
comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans
Information Theor
- …