2 research outputs found

    Range selection queries in data aware space and time

    No full text
    On a given vector X = (x1, x2, , xn) of integers, the range selection (i, j, k) query is finding the k-th smallest integer in (xi, xi+1, , xj) for any (i, j, k) such that 1 ? i ? j ? n, and 1 ? k ? j-i+1. Previous studies on the problem kept X intact and proposed data structures that occupied additional O (n. log n) bits of space over the X itself that answer the queries in logarithmic time. In this study, we replace X and encode all integers in it via a single wavelet tree by using S= n. log u + ?? logxi+o (n. log u + ??logxi) bits, where u is the number of distinct log xi values observed in X. Notice that u is at most 32 (64) for 32-bit (64-bit) integers and when xi>u, the space used for xi in the proposed data structure is less then the Elias-? coding of xi. Besides data-aware coding of X, the range selection is performed in O (log u + log x') time where x' is the k-th smallest integer in the queried range. This somewhat adaptive result interestingly achieves the range selection regardless of the size of X, and totally depends on the actual answer of the query. In summary, to the best of our knowledge, we present the first algorithm using data-aware space and time for the general range selection problem

    A simple yet time-optimal and linear-space algorithm for shortest unique substring queries

    Get PDF
    WOS: 000347602000043We revisit the problem of finding shortest unique substring (SUS) proposed recently by Pei et al. (2013) [12]. We propose an optimal O(n) time and space algorithm that can find an SUS for every location of a string of size n and thus significantly improve their O(n(2)) time complexity. Our method also supports finding all the SUSes covering every location, whereas theirs can find only one SUS for every location. Further, our solution is simpler and easier to implement and is more space efficient in practice, since we only use the inverse suffix array and the longest common prefix array of the string, while their algorithm uses the suffix tree of the string and other auxiliary data structures. Our theoretical results are validated by an empirical study with real-world data that shows our method is at least 8 times faster and uses at least 20 times less memory. The speedup gained by our method against Pei et al's can become even more significant when the string size increases due to their quadratic time complexity. We also have compared our method with the recent Tsuruta et al.'s (2014) [14] proposal, another independent 0(n) time and space algorithm for SUS finding. The empirical study shows that both methods have nearly the same processing speed. However, ours uses at least 4 times less memory for finding one SUS and at least 2 times less memory for finding all SUSes, both covering every string location.EWU's Faculty Grants for Research and Creative WorksSupported in part by EWU's Faculty Grants for Research and Creative Works
    corecore