2,454 research outputs found

    Linear-Space Data Structures for Range Mode Query in Arrays

    Full text link
    A mode of a multiset SS is an element aSa \in S of maximum multiplicity; that is, aa occurs at least as frequently as any other element in SS. Given a list A[1:n]A[1:n] of nn items, we consider the problem of constructing a data structure that efficiently answers range mode queries on AA. Each query consists of an input pair of indices (i,j)(i, j) for which a mode of A[i:j]A[i:j] must be returned. We present an O(n22ϵ)O(n^{2-2\epsilon})-space static data structure that supports range mode queries in O(nϵ)O(n^\epsilon) time in the worst case, for any fixed ϵ[0,1/2]\epsilon \in [0,1/2]. When ϵ=1/2\epsilon = 1/2, this corresponds to the first linear-space data structure to guarantee O(n)O(\sqrt{n}) query time. We then describe three additional linear-space data structures that provide O(k)O(k), O(m)O(m), and O(ji)O(|j-i|) query time, respectively, where kk denotes the number of distinct elements in AA and mm denotes the frequency of the mode of AA. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure

    Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array

    Full text link
    The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length nn can be represented as an array of length nn words, or, in the presence of the SA, as a bit vector of 2n2n bits plus asymptotically negligible support data structures. External memory construction algorithms for the LCP array have been proposed, but those proposed so far have a space requirement of O(n)O(n) words (i.e. O(nlogn)O(n \log n) bits) in external memory. This space requirement is in some practical cases prohibitively expensive. We present an external memory algorithm for constructing the 2n2n bit version of the LCP array which uses O(nlogσ)O(n \log \sigma) bits of additional space in external memory when given a (compressed) BWT with alphabet size σ\sigma and a sampled inverse suffix array at sampling rate O(logn)O(\log n). This is often a significant space gain in practice where σ\sigma is usually much smaller than nn or even constant. We also consider the case of computing succinct LCP arrays for circular strings

    Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

    Full text link
    This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time O(logn)O(\log n) per input symbol (as opposed to amortized O(logn)O(\log n) time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves O(logn)O(\log n) worst case time per input symbol. Searching for a pattern of length mm in the resulting suffix tree takes O(min(mlogΣ,m+logn)+tocc)O(\min(m\log |\Sigma|, m + \log n) + tocc) time, where tocctocc is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

    Sublinear Space Algorithms for the Longest Common Substring Problem

    Full text link
    Given mm documents of total length nn, we consider the problem of finding a longest string common to at least d2d \geq 2 of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic O(n)O(n) space and O(n)O(n) time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter 1τn1 \leq \tau \leq n, the LCS problem can be solved in O(τ)O(\tau) space and O(n2/τ)O(n^2/\tau) time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a τ\tau-additive approximation to the LCS in O(n2/τ)O(n^2/\tau) time and O(1)O(1) space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in O(τ)O(\tau) space must use Ω(nlog(n/(τlogn))/loglog(n/(τlogn))\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)}) time.Comment: Accepted to 22nd European Symposium on Algorithm

    Parallel Wavelet Tree Construction

    Full text link
    We present parallel algorithms for wavelet tree construction with polylogarithmic depth, improving upon the linear depth of the recent parallel algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core machine with two-way hyper-threading that we outperform the existing parallel algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential algorithm on a variety of real-world and artificial inputs. Our algorithms show good scalability with increasing thread count, input size and alphabet size. We also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings of the IEEE Data Compression Conference, 201
    corecore