845 research outputs found
On Maximal Unbordered Factors
Given a string of length , its maximal unbordered factor is the
longest factor which does not have a border. In this work we investigate the
relationship between and the length of the maximal unbordered factor of
. We prove that for the alphabet of size the expected length
of the maximal unbordered factor of a string of length~ is at least
(for sufficiently large values of ). As an application of this result, we
propose a new algorithm for computing the maximal unbordered factor of a
string.Comment: Accepted to the 26th Annual Symposium on Combinatorial Pattern
Matching (CPM 2015
Average-case analysis of perfect sorting by reversals (Journal Version)
Perfect sorting by reversals, a problem originating in computational
genomics, is the process of sorting a signed permutation to either the identity
or to the reversed identity permutation, by a sequence of reversals that do not
break any common interval. B\'erard et al. (2007) make use of strong interval
trees to describe an algorithm for sorting signed permutations by reversals.
Combinatorial properties of this family of trees are essential to the algorithm
analysis. Here, we use the expected value of certain tree parameters to prove
that the average run-time of the algorithm is at worst, polynomial, and
additionally, for sufficiently long permutations, the sorting algorithm runs in
polynomial time with probability one. Furthermore, our analysis of the subclass
of commuting scenarios yields precise results on the average length of a
reversal, and the average number of reversals.Comment: A preliminary version of this work appeared in the proceedings of
Combinatorial Pattern Matching (CPM) 2009. See arXiv:0901.2847; Discrete
Mathematics, Algorithms and Applications, vol. 3(3), 201
Faster Longest Common Extension Queries in Strings over General Alphabets
Longest common extension queries (often called longest common prefix queries)
constitute a fundamental building block in multiple string algorithms, for
example computing runs and approximate pattern matching. We show that a
sequence of LCE queries for a string of size over a general ordered
alphabet can be realized in time making only
symbol comparisons. Consequently, all runs in a string over a general
ordered alphabet can be computed in time making
symbol comparisons. Our results improve upon a solution by Kosolobov
(Information Processing Letters, 2016), who gave an algorithm with running time and conjectured that time is possible. We
make a significant progress towards resolving this conjecture. Our techniques
extend to the case of general unordered alphabets, when the time increases to
. The main tools are difference covers and the
disjoint-sets data structure.Comment: Accepted to CPM 201
Computing Lempel-Ziv Factorization Online
We present an algorithm which computes the Lempel-Ziv factorization of a word
of length on an alphabet of size online in the
following sense: it reads starting from the left, and, after reading each
characters of , updates the Lempel-Ziv
factorization. The algorithm requires bits of space and O(n
\log^2 n) time. The basis of the algorithm is a sparse suffix tree combined
with wavelet trees
- …