10,562 research outputs found
Wavelet Trees Meet Suffix Trees
We present an improved wavelet tree construction algorithm and discuss its
applications to a number of rank/select problems for integer keys and strings.
Given a string of length n over an alphabet of size , our
method builds the wavelet tree in time,
improving upon the state-of-the-art algorithm by a factor of .
As a consequence, given an array of n integers we can construct in time a data structure consisting of machine words and
capable of answering rank/select queries for the subranges of the array in
time. This is a -factor improvement in
query time compared to Chan and P\u{a}tra\c{s}cu and a -factor
improvement in construction time compared to Brodal et al.
Next, we switch to stringological context and propose a novel notion of
wavelet suffix trees. For a string w of length n, this data structure occupies
words, takes time to construct, and simultaneously
captures the combinatorial structure of substrings of w while enabling
efficient top-down traversal and binary search. In particular, with a wavelet
suffix tree we are able to answer in time the following two
natural analogues of rank/select queries for suffixes of substrings: for
substrings x and y of w count the number of suffixes of x that are
lexicographically smaller than y, and for a substring x of w and an integer k,
find the k-th lexicographically smallest suffix of x.
We further show that wavelet suffix trees allow to compute a
run-length-encoded Burrows-Wheeler transform of a substring x of w in time, where s denotes the length of the resulting run-length encoding.
This answers a question by Cormode and Muthukrishnan, who considered an
analogous problem for Lempel-Ziv compression.Comment: 33 pages, 5 figures; preliminary version published at SODA 201
Syntactic Complexity of Prefix-, Suffix-, Bifix-, and Factor-Free Regular Languages
The syntactic complexity of a regular language is the cardinality of its
syntactic semigroup. The syntactic complexity of a subclass of the class of
regular languages is the maximal syntactic complexity of languages in that
class, taken as a function of the state complexity of these languages. We
study the syntactic complexity of prefix-, suffix-, bifix-, and factor-free
regular languages. We prove that is a tight upper bound for
prefix-free regular languages. We present properties of the syntactic
semigroups of suffix-, bifix-, and factor-free regular languages, conjecture
tight upper bounds on their size to be , , and ,
respectively, and exhibit languages with these syntactic complexities.Comment: 28 pages, 6 figures, 3 tables. An earlier version of this paper was
presented in: M. Holzer, M. Kutrib, G. Pighizzini, eds., 13th Int. Workshop
on Descriptional Complexity of Formal Systems, DCFS 2011, Vol. 6808 of LNCS,
Springer, 2011, pp. 93-106. The current version contains improved bounds for
suffix-free languages, new results about factor-free languages, and new
results about reversa
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
This paper presents a general technique for optimally transforming any
dynamic data structure that operates on atomic and indivisible keys by
constant-time comparisons, into a data structure that handles unbounded-length
keys whose comparison cost is not a constant. Examples of these keys are
strings, multi-dimensional points, multiple-precision numbers, multi-key data
(e.g.~records), XML paths, URL addresses, etc. The technique is more general
than what has been done in previous work as no particular exploitation of the
underlying structure of is required. The only requirement is that the insertion
of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst
case time per input symbol (as opposed to amortized
time per symbol, achieved by previously known algorithms). To our knowledge,
our algorithm is the first that achieves worst case time per input
symbol. Searching for a pattern of length in the resulting suffix tree
takes time, where is the
number of occurrences of the pattern. The paper also describes more
applications and show how to obtain alternative methods for dealing with suffix
sorting, dynamic lowest common ancestors and order maintenance
New Algorithms for Position Heaps
We present several results about position heaps, a relatively new alternative
to suffix trees and suffix arrays. First, we show that, if we limit the maximum
length of patterns to be sought, then we can also limit the height of the heap
and reduce the worst-case cost of insertions and deletions. Second, we show how
to build a position heap in linear time independent of the size of the
alphabet. Third, we show how to augment a position heap such that it supports
access to the corresponding suffix array, and vice versa. Fourth, we introduce
a variant of a position heap that can be simulated efficiently by a compressed
suffix array with a linear number of extra bits
Streaming Property Testing of Visibly Pushdown Languages
In the context of language recognition, we demonstrate the superiority of
streaming property testers against streaming algorithms and property testers,
when they are not combined. Initiated by Feigenbaum et al., a streaming
property tester is a streaming algorithm recognizing a language under the
property testing approximation: it must distinguish inputs of the language from
those that are -far from it, while using the smallest possible
memory (rather than limiting its number of input queries).
Our main result is a streaming -property tester for visibly
pushdown languages (VPL) with one-sided error using memory space
.
This constructions relies on a (non-streaming) property tester for weighted
regular languages based on a previous tester by Alon et al. We provide a simple
application of this tester for streaming testing special cases of instances of
VPL that are already hard for both streaming algorithms and property testers.
Our main algorithm is a combination of an original simulation of visibly
pushdown automata using a stack with small height but possible items of linear
size. In a second step, those items are replaced by small sketches. Those
sketches relies on a notion of suffix-sampling we introduce. This sampling is
the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio
- …