317 research outputs found
Identifying all abelian periods of a string in quadratic time and relevant problems
Abelian periodicity of strings has been studied extensively over the last
years. In 2006 Constantinescu and Ilie defined the abelian period of a string
and several algorithms for the computation of all abelian periods of a string
were given. In contrast to the classical period of a word, its abelian version
is more flexible, factors of the word are considered the same under any
internal permutation of their letters. We show two O(|y|^2) algorithms for the
computation of all abelian periods of a string y. The first one maps each
letter to a suitable number such that each factor of the string can be
identified by the unique sum of the numbers corresponding to its letters and
hence abelian periods can be identified easily. The other one maps each letter
to a prime number such that each factor of the string can be identified by the
unique product of the numbers corresponding to its letters and so abelian
periods can be identified easily. We also define weak abelian periods on
strings and give an O(|y|log(|y|)) algorithm for their computation, together
with some other algorithms for more basic problems.Comment: Accepted in the "International Journal of foundations of Computer
Science
Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties
Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word
Algorithms for Longest Common Abelian Factors
In this paper we consider the problem of computing the longest common abelian
factor (LCAF) between two given strings. We present a simple
time algorithm, where is the length of the strings and is the
alphabet size, and a sub-quadratic running time solution for the binary string
case, both having linear space requirement. Furthermore, we present a modified
algorithm applying some interesting tricks and experimentally show that the
resulting algorithm runs faster.Comment: 13 pages, 4 figure
Linear Algorithm for Conservative Degenerate Pattern Matching
A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a
sequence of such symbols is a degenerate string. A degenerate string is said to
be conservative if its number of non-solid symbols is upper-bounded by a fixed
positive constant k. We consider here the matching problem of conservative
degenerate strings and present the first linear-time algorithm that can find,
for given degenerate strings P* and T* of total length n containing k non-solid
symbols in total, the occurrences of P* in T* in O(nk) time
Two-dimensional prefix string matching and covering on square matrices
International audienceTwo linear time algorithms are presented. One for determining, for every position in a given square matrix, the longest prefix of a given pattern (also a square matrix) that occurs at that position and one for computing all square covers of a given two-dimensional square matrix
The Swap Matching Problem Revisited
In this paper, we revisit the much studied problem of Pattern Matching with
Swaps (Swap Matching problem, for short). We first present a graph-theoretic
model, which opens a new and so far unexplored avenue to solve the problem.
Then, using the model, we devise two efficient algorithms to solve the swap
matching problem. The resulting algorithms are adaptations of the classic
shift-and algorithm. For patterns having length similar to the word-size of the
target machine, both the algorithms run in linear time considering a fixed
alphabet.Comment: 23 pages, 3 Figures and 17 Table
Editorial: Repetitive Structures in Biological Sequences: Algorithms and Applications
Repetitive structures in biological sequences are emerging as an active focus of research and the unifying concept of ?repeatome? (the ensemble of knowledge associated with repeating structures in genomic/proteomic data) has been recently proposed in order to highlight several converging trends
- âŠ