1,853 research outputs found
Minimal Suffix and Rotation of a Substring in Optimal Time
For a text given in advance, the substring minimal suffix queries ask to
determine the lexicographically minimal non-empty suffix of a substring
specified by the location of its occurrence in the text. We develop a data
structure answering such queries optimally: in constant time after linear-time
preprocessing. This improves upon the results of Babenko et al. (CPM 2014),
whose trade-off solution is characterized by product of these
time complexities. Next, we extend our queries to support concatenations of
substrings, for which the construction and query time is preserved. We
apply these generalized queries to compute lexicographically minimal and
maximal rotations of a given substring in constant time after linear-time
preprocessing.
Our data structures mainly rely on properties of Lyndon words and Lyndon
factorizations. We combine them with further algorithmic and combinatorial
tools, such as fusion trees and the notion of order isomorphism of strings
Document Retrieval on Repetitive Collections
Document retrieval aims at finding the most important documents where a
pattern appears in a collection of strings. Traditional pattern-matching
techniques yield brute-force document retrieval solutions, which has motivated
the research on tailored indexes that offer near-optimal performance. However,
an experimental study establishing which alternatives are actually better than
brute force, and which perform best depending on the collection
characteristics, has not been carried out. In this paper we address this
shortcoming by exploring the relationship between the nature of the underlying
collection and the performance of current methods. Via extensive experiments we
show that established solutions are often beaten in practice by brute-force
alternatives. We also design new methods that offer superior time/space
trade-offs, particularly on repetitive collections.Comment: Accepted to ESA 2014. Implementation and experiments at
http://www.cs.helsinki.fi/group/suds/rlcsa
Wavelet Trees Meet Suffix Trees
We present an improved wavelet tree construction algorithm and discuss its
applications to a number of rank/select problems for integer keys and strings.
Given a string of length n over an alphabet of size , our
method builds the wavelet tree in time,
improving upon the state-of-the-art algorithm by a factor of .
As a consequence, given an array of n integers we can construct in time a data structure consisting of machine words and
capable of answering rank/select queries for the subranges of the array in
time. This is a -factor improvement in
query time compared to Chan and P\u{a}tra\c{s}cu and a -factor
improvement in construction time compared to Brodal et al.
Next, we switch to stringological context and propose a novel notion of
wavelet suffix trees. For a string w of length n, this data structure occupies
words, takes time to construct, and simultaneously
captures the combinatorial structure of substrings of w while enabling
efficient top-down traversal and binary search. In particular, with a wavelet
suffix tree we are able to answer in time the following two
natural analogues of rank/select queries for suffixes of substrings: for
substrings x and y of w count the number of suffixes of x that are
lexicographically smaller than y, and for a substring x of w and an integer k,
find the k-th lexicographically smallest suffix of x.
We further show that wavelet suffix trees allow to compute a
run-length-encoded Burrows-Wheeler transform of a substring x of w in time, where s denotes the length of the resulting run-length encoding.
This answers a question by Cormode and Muthukrishnan, who considered an
analogous problem for Lempel-Ziv compression.Comment: 33 pages, 5 figures; preliminary version published at SODA 201
Linear-Space Data Structures for Range Mode Query in Arrays
A mode of a multiset is an element of maximum multiplicity;
that is, occurs at least as frequently as any other element in . Given a
list of items, we consider the problem of constructing a data
structure that efficiently answers range mode queries on . Each query
consists of an input pair of indices for which a mode of must
be returned. We present an -space static data structure
that supports range mode queries in time in the worst case, for
any fixed . When , this corresponds to
the first linear-space data structure to guarantee query time. We
then describe three additional linear-space data structures that provide
, , and query time, respectively, where denotes the
number of distinct elements in and denotes the frequency of the mode of
. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
Duel and sweep algorithm for order-preserving pattern matching
Given a text and a pattern over alphabet , the classic exact
matching problem searches for all occurrences of pattern in text .
Unlike exact matching problem, order-preserving pattern matching (OPPM)
considers the relative order of elements, rather than their real values. In
this paper, we propose an efficient algorithm for OPPM problem using the
"duel-and-sweep" paradigm. Our algorithm runs in time in
general and time under an assumption that the characters in a string
can be sorted in linear time with respect to the string size. We also perform
experiments and show that our algorithm is faster that KMP-based algorithm.
Last, we introduce the two-dimensional order preserved pattern matching and
give a duel and sweep algorithm that runs in time for duel stage and
time for sweeping time with preprocessing time.Comment: 13 pages, 5 figure
- …