104,356 research outputs found
Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space
Indexing highly repetitive texts - such as genomic databases, software
repositories and versioned text collections - has become an important problem
since the turn of the millennium. A relevant compressibility measure for
repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms
(BWTs). One of the earliest indexes for repetitive collections, the Run-Length
FM-index, used O(r) space and was able to efficiently count the number of
occurrences of a pattern of length m in the text (in loglogarithmic time per
pattern symbol, with current techniques). However, it was unable to locate the
positions of those occurrences efficiently within a space bounded in terms of
r. In this paper we close this long-standing problem, showing how to extend the
Run-Length FM-index so that it can locate the occ occurrences efficiently
within O(r) space (in loglogarithmic time each), and reaching optimal time, O(m
+ occ), within O(r log log w ({\sigma} + n/r)) space, for a text of length n
over an alphabet of size {\sigma} on a RAM machine with words of w =
{\Omega}(log n) bits. Within that space, our index can also count in optimal
time, O(m). Multiplying the space by O(w/ log {\sigma}), we support count and
locate in O(dm log({\sigma})/we) and O(dm log({\sigma})/we + occ) time, which
is optimal in the packed setting and had not been obtained before in compressed
space. We also describe a structure using O(r log(n/r)) space that replaces the
text and extracts any text substring of length ` in almost-optimal time
O(log(n/r) + ` log({\sigma})/w). Within that space, we similarly provide direct
access to suffix array, inverse suffix array, and longest common prefix array
cells, and extend these capabilities to full suffix tree functionality,
typically in O(log(n/r)) time per operation.Comment: submitted version; optimal count and locate in smaller space: O(r log
log_w(n/r + sigma)
Trees with Convex Faces and Optimal Angles
We consider drawings of trees in which all edges incident to leaves can be
extended to infinite rays without crossing, partitioning the plane into
infinite convex polygons. Among all such drawings we seek the one maximizing
the angular resolution of the drawing. We find linear time algorithms for
solving this problem, both for plane trees and for trees without a fixed
embedding. In any such drawing, the edge lengths may be set independently of
the angles, without crossing; we describe multiple strategies for setting these
lengths.Comment: 12 pages, 10 figures. To appear at 14th Int. Symp. Graph Drawing,
200
Minimizing the stabbing number of matchings, trees, and triangulations
The (axis-parallel) stabbing number of a given set of line segments is the
maximum number of segments that can be intersected by any one (axis-parallel)
line. This paper deals with finding perfect matchings, spanning trees, or
triangulations of minimum stabbing number for a given set of points. The
complexity of these problems has been a long-standing open question; in fact,
it is one of the original 30 outstanding open problems in computational
geometry on the list by Demaine, Mitchell, and O'Rourke. The answer we provide
is negative for a number of minimum stabbing problems by showing them NP-hard
by means of a general proof technique. It implies non-trivial lower bounds on
the approximability. On the positive side we propose a cut-based integer
programming formulation for minimizing the stabbing number of matchings and
spanning trees. We obtain lower bounds (in polynomial time) from the
corresponding linear programming relaxations, and show that an optimal
fractional solution always contains an edge of at least constant weight. This
result constitutes a crucial step towards a constant-factor approximation via
an iterated rounding scheme. In computational experiments we demonstrate that
our approach allows for actually solving problems with up to several hundred
points optimally or near-optimally.Comment: 25 pages, 12 figures, Latex. To appear in "Discrete and Computational
Geometry". Previous version (extended abstract) appears in SODA 2004, pp.
430-43
The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space
An indexed sequence of strings is a data structure for storing a string
sequence that supports random access, searching, range counting and analytics
operations, both for exact matches and prefix search. String sequences lie at
the core of column-oriented databases, log processing, and other storage and
query tasks. In these applications each string can appear several times and the
order of the strings in the sequence is relevant. The prefix structure of the
strings is relevant as well: common prefixes are sought in strings to extract
interesting features from the sequence. Moreover, space-efficiency is highly
desirable as it translates directly into higher performance, since more data
can fit in fast memory.
We introduce and study the problem of compressed indexed sequence of strings,
representing indexed sequences of strings in nearly-optimal compressed space,
both in the static and dynamic settings, while preserving provably good
performance for the supported operations.
We present a new data structure for this problem, the Wavelet Trie, which
combines the classical Patricia Trie with the Wavelet Tree, a succinct data
structure for storing a compressed sequence. The resulting Wavelet Trie
smoothly adapts to a sequence of strings that changes over time. It improves on
the state-of-the-art compressed data structures by supporting a dynamic
alphabet (i.e. the set of distinct strings) and prefix queries, both crucial
requirements in the aforementioned applications, and on traditional indexes by
reducing space occupancy to close to the entropy of the sequence
- …