104,356 research outputs found

    Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space

    Get PDF
    Indexing highly repetitive texts - such as genomic databases, software repositories and versioned text collections - has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of r. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time, O(m + occ), within O(r log log w ({\sigma} + n/r)) space, for a text of length n over an alphabet of size {\sigma} on a RAM machine with words of w = {\Omega}(log n) bits. Within that space, our index can also count in optimal time, O(m). Multiplying the space by O(w/ log {\sigma}), we support count and locate in O(dm log({\sigma})/we) and O(dm log({\sigma})/we + occ) time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using O(r log(n/r)) space that replaces the text and extracts any text substring of length ` in almost-optimal time O(log(n/r) + ` log({\sigma})/w). Within that space, we similarly provide direct access to suffix array, inverse suffix array, and longest common prefix array cells, and extend these capabilities to full suffix tree functionality, typically in O(log(n/r)) time per operation.Comment: submitted version; optimal count and locate in smaller space: O(r log log_w(n/r + sigma)

    Trees with Convex Faces and Optimal Angles

    Full text link
    We consider drawings of trees in which all edges incident to leaves can be extended to infinite rays without crossing, partitioning the plane into infinite convex polygons. Among all such drawings we seek the one maximizing the angular resolution of the drawing. We find linear time algorithms for solving this problem, both for plane trees and for trees without a fixed embedding. In any such drawing, the edge lengths may be set independently of the angles, without crossing; we describe multiple strategies for setting these lengths.Comment: 12 pages, 10 figures. To appear at 14th Int. Symp. Graph Drawing, 200

    Minimizing the stabbing number of matchings, trees, and triangulations

    Full text link
    The (axis-parallel) stabbing number of a given set of line segments is the maximum number of segments that can be intersected by any one (axis-parallel) line. This paper deals with finding perfect matchings, spanning trees, or triangulations of minimum stabbing number for a given set of points. The complexity of these problems has been a long-standing open question; in fact, it is one of the original 30 outstanding open problems in computational geometry on the list by Demaine, Mitchell, and O'Rourke. The answer we provide is negative for a number of minimum stabbing problems by showing them NP-hard by means of a general proof technique. It implies non-trivial lower bounds on the approximability. On the positive side we propose a cut-based integer programming formulation for minimizing the stabbing number of matchings and spanning trees. We obtain lower bounds (in polynomial time) from the corresponding linear programming relaxations, and show that an optimal fractional solution always contains an edge of at least constant weight. This result constitutes a crucial step towards a constant-factor approximation via an iterated rounding scheme. In computational experiments we demonstrate that our approach allows for actually solving problems with up to several hundred points optimally or near-optimally.Comment: 25 pages, 12 figures, Latex. To appear in "Discrete and Computational Geometry". Previous version (extended abstract) appears in SODA 2004, pp. 430-43

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence
    • …
    corecore