4,849 research outputs found

    Succinct Indexable Dictionaries with Applications to Encoding kk-ary Trees, Prefix Sums and Multisets

    Full text link
    We consider the {\it indexable dictionary} problem, which consists of storing a set S{0,...,m1}S \subseteq \{0,...,m-1\} for some integer mm, while supporting the operations of \Rank(x), which returns the number of elements in SS that are less than xx if xSx \in S, and -1 otherwise; and \Select(i) which returns the ii-th smallest element in SS. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+o(n)+O(lglgm){\cal B}(n,m) + o(n) + O(\lg \lg m) bits to store a set of size nn, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any nn-element subset from a universe of size mm. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lglgm)O(\lg \lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a kk-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size nn from {0,...,m1}\{0,...,m-1\} in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of nn non-negative integers summing up to mm in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

    Optimistic Parallelization of Floating-Point Accumulation

    Get PDF
    Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks

    CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is insensitive to the sparsity structure of the input matrix. Thus the single format can support an SpMV algorithm that is efficient both for regular matrices and for irregular matrices. Furthermore, we show that the overhead of the format conversion from the CSR to the CSR5 can be as low as the cost of a few SpMV operations. We compare the CSR5-based SpMV algorithm with 11 state-of-the-art formats and algorithms on four mainstream processors using 14 regular and 10 irregular matrices as a benchmark suite. For the 14 regular matrices in the suite, we achieve comparable or better performance over the previous work. For the 10 irregular matrices, the CSR5 obtains average performance improvement of 17.6\%, 28.5\%, 173.0\% and 293.3\% (up to 213.3\%, 153.6\%, 405.1\% and 943.3\%) over the best existing work on dual-socket Intel CPUs, an nVidia GPU, an AMD GPU and an Intel Xeon Phi, respectively. For real-world applications such as a solver with only tens of iterations, the CSR5 format can be more practical because of its low-overhead for format conversion. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5Comment: 12 pages, 10 figures, In Proceedings of the 29th ACM International Conference on Supercomputing (ICS '15

    RLZAP: Relative Lempel-Ziv with Adaptive Pointers

    Full text link
    Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a reference genome is selected and then the other genomes are greedily parsed into phrases exactly matching substrings of the reference. Deorowicz and Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with a mismatch character usually gives better compression because many of the differences between individuals' genomes are single-nucleotide substitutions. Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers and run-length compressing them usually gives even better compression. In this paper we generalize Ferrada et al.'s idea to handle well also short insertions, deletions and multi-character substitutions. We show experimentally that our generalization achieves better compression than Ferrada et al.'s implementation with comparable random-access times

    Linear-Space Data Structures for Range Mode Query in Arrays

    Full text link
    A mode of a multiset SS is an element aSa \in S of maximum multiplicity; that is, aa occurs at least as frequently as any other element in SS. Given a list A[1:n]A[1:n] of nn items, we consider the problem of constructing a data structure that efficiently answers range mode queries on AA. Each query consists of an input pair of indices (i,j)(i, j) for which a mode of A[i:j]A[i:j] must be returned. We present an O(n22ϵ)O(n^{2-2\epsilon})-space static data structure that supports range mode queries in O(nϵ)O(n^\epsilon) time in the worst case, for any fixed ϵ[0,1/2]\epsilon \in [0,1/2]. When ϵ=1/2\epsilon = 1/2, this corresponds to the first linear-space data structure to guarantee O(n)O(\sqrt{n}) query time. We then describe three additional linear-space data structures that provide O(k)O(k), O(m)O(m), and O(ji)O(|j-i|) query time, respectively, where kk denotes the number of distinct elements in AA and mm denotes the frequency of the mode of AA. Finally, we examine generalizing our data structures to higher dimensions.Comment: 13 pages, 2 figure
    corecore