Search CORE

1,694 research outputs found

Almost Linear Time Computation of Maximal Repetitions in Run Length Encoded Strings

Author: Bannai Hideo
Fujishige Yuta
Inenaga Shunsuke
Nakashima Yuto
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th International Symposium on Algorithms and Computation (ISAAC 2017)
Publication date: 01/01/2017
Field of study

We consider the problem of computing all maximal repetitions contained in a string that is given in run-length encoding. Given a run-length encoding of a string, we show that the maximum number of maximal repetitions contained in the string is at most m+k-1, where m is the size of the run-length encoding, and k is the number of run-length factors whose exponent is at least 2. We also show an algorithm for computing all maximal repetitions in O(m alpha(m)) time and O(m) space, where alpha denotes the inverse Ackermann function

Dagstuhl Research Online Publication Server

Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit

Author: Crochemore Maxime
Landau Gad M.
Schieber Baruch
Ziv-Ukelson Michal
Publication venue: King's College London Publications
Publication date: 01/01/2005
Field of study

International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Adaptive Protocols for Interactive Communication

Author: Agrawal Shweta
Gelles Ran
Sahai Amit
Publication venue
Publication date: 07/08/2015
Field of study

How much adversarial noise can protocols for interactive communication tolerate? This question was examined by Braverman and Rao (IEEE Trans. Inf. Theory, 2014) for the case of "robust" protocols, where each party sends messages only in fixed and predetermined rounds. We consider a new class of non-robust protocols for Interactive Communication, which we call adaptive protocols. Such protocols adapt structurally to the noise induced by the channel in the sense that both the order of speaking, and the length of the protocol may vary depending on observed noise. We define models that capture adaptive protocols and study upper and lower bounds on the permissible noise rate in these models. When the length of the protocol may adaptively change according to the noise, we demonstrate a protocol that tolerates noise rates up to

1/3

. When the order of speaking may adaptively change as well, we demonstrate a protocol that tolerates noise rates up to

2/3

. Hence, adaptivity circumvents an impossibility result of

1/4

on the fraction of tolerable noise (Braverman and Rao, 2014).Comment: Content is similar to previous version yet with an improved presentatio

arXiv.org e-Print Archive

CiteSeerX

Detecting One-variable Patterns

Author: A Amir
A Ehrenfeucht
D Angluin
D Kosolobov
D Kosolobov
E Czeizler
F Manea
G Manacher
J Kärkkäinen
JEF Friedl
M Crochemore
M Crochemore
M Lothaire
M Rubinchik
ML Schmid
P Gawrychowski
Z Galil
Z Xu
Publication venue
Publication date: 01/01/2017
Field of study

Given a pattern

p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r

such that

x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}

, where

x

is a variable and

\overset{{}_{\leftarrow}}{x}

its reversal, and

s_1,s_2,\ldots,s_r

are strings that contain no variables, we describe an algorithm that constructs in

O(rn)

time a compact representation of all

P

instances of

p

in an input string of length

n

over a polynomially bounded integer alphabet, so that one can report those instances in

O(P)

time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Recompression: a simple and powerful technique for word equations

Author: Jeż Artur
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we present an application of a simple technique of local recompression, previously developed by the author in the context of compressed membership problems and compressed pattern matching, to word equations. The technique is based on local modification of variables (replacing X by aX or Xa) and iterative replacement of pairs of letters appearing in the equation by a `fresh' letter, which can be seen as a bottom-up compression of the solution of the given word equation, to be more specific, building an SLP (Straight-Line Programme) for the solution of the word equation. Using this technique we give a new, independent and self-contained proofs of most of the known results for word equations. To be more specific, the presented (nondeterministic) algorithm runs in O(n log n) space and in time polynomial in log N, where N is the size of the length-minimal solution of the word equation. The presented algorithm can be easily generalised to a generator of all solutions of the given word equation (without increasing the space usage). Furthermore, a further analysis of the algorithm yields a doubly exponential upper bound on the size of the length-minimal solution. The presented algorithm does not use exponential bound on the exponent of periodicity. Conversely, the analysis of the algorithm yields an independent proof of the exponential bound on exponent of periodicity. We believe that the presented algorithm, its idea and analysis are far simpler than all previously applied. Furthermore, thanks to it we can obtain a unified and simple approach to most of known results for word equations. As a small additional result we show that for O(1) variables (with arbitrary many appearances in the equation) word equations can be solved in linear space, i.e. they are context-sensitive.Comment: Submitted to a journal. Since previous version the proofs were simplified, overall presentation improve

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

MPG.PuRe

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications

Author: An
Fast
Guruswami Venkatesan
Haeupler Bernhard
Haeupler Bernhard
Haeupler Bernhard
Hemenway Brett
Sherstov Alexander A
Publication venue
Publication date: 09/11/2017
Field of study

This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication settings:

\bullet

We give a deterministic, linear time synchronization string construction, improving over an

O(n^5)

time randomized construction. Independently of this work, a deterministic

O(n\log^2\log n)

time construction was just put on arXiv by Cheng, Li, and Wu. We also give a deterministic linear time construction of an infinite synchronization string, which was not known to be computable before. Both constructions are highly explicit, i.e., the

i^{th}

symbol can be computed in

O(\log i)

time.

\bullet

This paper also introduces a generalized notion we call long-distance synchronization strings that allow for local and very fast decoding. In particular, only

O(\log^3 n)

time and access to logarithmically many symbols is required to decode any index. We give several applications for these results:

\bullet

For any

\delta0

we provide an insdel correcting code with rate

1-\delta-\epsilon

which can correct any

O(\delta)

fraction of insdel errors in

O(n\log^3n)

time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time. We show that such codes can not only efficiently recover from

\delta

fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any

O(\delta/\log n)

fraction of block transpositions and replications.

\bullet

We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can be used to give the first near linear time interactive coding scheme for insdel errors

arXiv.org e-Print Archive

Crossref

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

A linearly computable measure of string complexity

Author: Becher Verónica
Heiber Pablo Ariel
Publication venue: Elsevier B.V.
Publication date: 01/01/2012
Field of study

AbstractWe present a measure of string complexity, called I-complexity, computable in linear time and space. It counts the number of different substrings in a given string. The least complex strings are the runs of a single symbol, the most complex are the de Bruijn strings. Although the I-complexity of a string is not the length of any minimal description of the string, it satisfies many basic properties of classical description complexity. In particular, the number of strings with I-complexity up to a given value is bounded, and most strings of each length have high I-complexity

CiteSeerX

Elsevier - Publisher Connector

CONICET Digital

Shannon Information and Kolmogorov Complexity

Author: Grunwald Peter
Vitanyi Paul
Publication venue
Publication date: 01/01/2004
Field of study

We compare the elementary theories of Shannon information and Kolmogorov complexity, the extent to which they have a common purpose, and where they are fundamentally different. We discuss and relate the basic notions of both theories: Shannon entropy versus Kolmogorov complexity, the relation of both to universal coding, Shannon mutual information versus Kolmogorov (`algorithmic') mutual information, probabilistic sufficient statistic versus algorithmic sufficient statistic (related to lossy compression in the Shannon theory versus meaningful information in the Kolmogorov theory), and rate distortion theory versus Kolmogorov's structure function. Part of the material has appeared in print before, scattered through various publications, but this is the first comprehensive systematic comparison. The last mentioned relations are new.Comment: Survey, LaTeX 54 pages, 3 figures, Submitted to IEEE Trans Information Theor

arXiv.org e-Print Archive

CiteSeerX