Search CORE

4,849 research outputs found

Succinct Indexable Dictionaries with Applications to Encoding $k$ -ary Trees, Prefix Sums and Multisets

Author: Fich F. E.
Grossi R.
Hagerup T.
Hagerup T.
Jansson J.
Munro J. I.
Paul W. J.
Rajeev Raman
Raman R.
Raman V.
Srinivasa Rao Satti
Venkatesh Raman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/05/2007
Field of study

We consider the {\it indexable dictionary} problem, which consists of storing a set

S \subseteq \{0,...,m-1\}

for some integer

m

, while supporting the operations of \Rank(x), which returns the number of elements in

S

that are less than

x

x \in S

, and -1 otherwise; and \Select(i) which returns the

i

-th smallest element in

S

. We give a data structure that supports both operations in O(1) time on the RAM model and requires

{\cal B}(n,m) + o(n) + O(\lg \lg m)

bits to store a set of size

n

, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any

n

-element subset from a universe of size

m

. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the

O(\lg \lg m)

additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a

k

-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size

n

from

\{0,...,m-1\}

{\cal B}(n,m+n) + o(n)

bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of

n

non-negative integers summing up to

m

{\cal B}(n,m+n) + o(n)

bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

arXiv.org e-Print Archive

Crossref

Optimistic Parallelization of Floating-Point Accumulation

Author: DeHon André
Kapre Nachiket
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Floating-point arithmetic is notoriously non-associative due to the limited precision representation which demands intermediate values be rounded to fit in the available precision. The resulting cyclic dependency in floating-point accumulation inhibits parallelization of the computation, including efficient use of pipelining. In practice, however, we observe that floating-point operations are "mostly" associative. This observation can be exploited to parallelize floating-point accumulation using a form of optimistic concurrency. In this scheme, we first compute an optimistic associative approximation to the sum and then relax the computation by iteratively propagating errors until the correct sum is obtained. We map this computation to a network of 16 statically-scheduled, pipelined, double-precision floating-point adders on the Virtex-4 LX160 (-12) device where each floating-point adder runs at 296 MHz and has a pipeline depth of 10. On this 16 PE design, we demonstrate an average speedup of 6× with randomly generated data and 3-7× with summations extracted from Conjugate Gradient benchmarks

CiteSeerX

Crossref

Caltech Authors

ScholarlyCommons@Penn

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

Author: Liu Weifeng
Vinter Brian
Publication venue
Publication date: 09/04/2015
Field of study

Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is insensitive to the sparsity structure of the input matrix. Thus the single format can support an SpMV algorithm that is efficient both for regular matrices and for irregular matrices. Furthermore, we show that the overhead of the format conversion from the CSR to the CSR5 can be as low as the cost of a few SpMV operations. We compare the CSR5-based SpMV algorithm with 11 state-of-the-art formats and algorithms on four mainstream processors using 14 regular and 10 irregular matrices as a benchmark suite. For the 14 regular matrices in the suite, we achieve comparable or better performance over the previous work. For the 10 irregular matrices, the CSR5 obtains average performance improvement of 17.6\%, 28.5\%, 173.0\% and 293.3\% (up to 213.3\%, 153.6\%, 405.1\% and 943.3\%) over the best existing work on dual-socket Intel CPUs, an nVidia GPU, an AMD GPU and an Intel Xeon Phi, respectively. For real-world applications such as a solver with only tens of iterations, the CSR5 format can be more practical because of its low-overhead for format conversion. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5Comment: 12 pages, 10 figures, In Proceedings of the 29th ACM International Conference on Supercomputing (ICS '15

arXiv.org e-Print Archive

Copenhagen University Research Information System

RLZAP: Relative Lempel-Ziv with Adaptive Pointers

Author: A Farruggia
C Boucher
C Hoobin
D Belazzougui
H Ferrada
J Ziv
J Ziv
M Léonard
P Ferragina
R Raman
S Deorowicz
S Deorowicz
S Kuruppu
Publication venue
Publication date: 01/01/2016
Field of study

Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a reference genome is selected and then the other genomes are greedily parsed into phrases exactly matching substrings of the reference. Deorowicz and Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with a mismatch character usually gives better compression because many of the differences between individuals' genomes are single-nucleotide substitutions. Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers and run-length compressing them usually gives even better compression. In this paper we generalize Ferrada et al.'s idea to handle well also short insertions, deletions and multi-character substitutions. We show experimentally that our generalization achieves better compression than Ferrada et al.'s implementation with comparable random-access times

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa