Search CORE

12,399 research outputs found

GPUs as Storage System Accelerators

Author: Al-Kiswany Samer
Gharaibeh Abdullah
Ripeanu Matei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2012
Field of study

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

arXiv.org e-Print Archive

Crossref

Strongly universal string hashing is fast

Author: Kaser Owen
Lemire Daniel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/11/2014
Field of study

We present fast strongly universal string hashing families: they can process data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that these families---though they require a large buffer of random numbers---are often faster than popular hash functions with weaker theoretical guarantees. Moreover, conventional wisdom is that hash functions with fewer multiplications are faster. Yet we find that they may fail to be faster due to operation pipelining. We present experimental results on several processors including low-powered processors. Our tests include hash functions designed for processors with the Carry-Less Multiplication (CLMUL) instruction set. We also prove, using accessible proofs, the strong universality of our families.Comment: Software is available at http://code.google.com/p/variablelengthstringhashing/ and https://github.com/lemire/StronglyUniversalStringHashin

arXiv.org e-Print Archive

R-libre

Crossref

Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

Author: A Andoni
AL Zobrist
AZ Broder
JL Carter
JL Carter
K Terasawa
M Dubiner
MH Overmars
ML Fredman
N Sundaram
P Li
S Har-Peled
T Hagerup
Publication venue
Publication date: 16/02/2018
Field of study

The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution

\mathcal{H}

over locality-sensitive hash functions that partition space. For a collection of

n

points, after preprocessing, the query time is dominated by

O(n^{\rho} \log n)

evaluations of hash functions from

\mathcal{H}

and

O(n^{\rho})

hash table lookups and distance computations where

\rho \in (0,1)

is determined by the locality-sensitivity properties of

\mathcal{H}

. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to

O(\log^2 n)

, leaving the query time to be dominated by

O(n^{\rho})

distance computations and

O(n^{\rho} \log n)

additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to

O(n^\rho)

.Comment: 15 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Wear Minimization for Cuckoo Hashing: How Not to Throw a Lot of Eggs into One Basket

Author: A. Ben-Aroya
A. Kirsch
A.M. Frieze
A.M. Frieze
D. Fotakis
E. Lehman
H.-S.P. Wong
J. Schmidt-Pruzan
L. Devroye
M. Dietzfelbinger
M. Karoński
P. Pavan
R. Bez
R. Pagh
S. Irani
Y. Arbitman
Y. Azar
Y.-H. Chang
Publication venue
Publication date: 01/01/2014
Field of study

We study wear-leveling techniques for cuckoo hashing, showing that it is possible to achieve a memory wear bound of

\log\log n+O(1)

after the insertion of

n

items into a table of size

Cn

for a suitable constant

C

using cuckoo hashing. Moreover, we study our cuckoo hashing method empirically, showing that it significantly improves on the memory wear performance for classic cuckoo hashing and linear probing in practice.Comment: 13 pages, 1 table, 7 figures; to appear at the 13th Symposium on Experimental Algorithms (SEA 2014

arXiv.org e-Print Archive

Crossref

Scalable and Sustainable Deep Learning via Randomized Hashing

Author: Chen Wenlin
Gionis Aristides
Indyk Piotr
Loosli Gaëlle
Lv Qin
McMahan H. Brendan
Recht Benjamin
Shrivastava Anshumali
Shrivastava Anshumali
Publication venue
Publication date: 04/12/2016
Field of study

Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend to bring deep learning to low-power, embedded devices. The matrix operations, associated with both training and testing of deep networks, are very expensive from a computational and energy standpoint. We present a novel hashing based technique to drastically reduce the amount of computation needed to train and test deep networks. Our approach combines recent ideas from adaptive dropouts and randomized hashing for maximum inner product search to select the nodes with the highest activation efficiently. Our new algorithm for deep learning reduces the overall computational cost of forward and back-propagation by operating on significantly fewer (sparse) nodes. As a consequence, our algorithm uses only 5% of the total multiplications, while keeping on average within 1% of the accuracy of the original model. A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of our proposed algorithm via rigorous experimental evaluations on several real datasets

arXiv.org e-Print Archive

Crossref