156 research outputs found
A Survey of Hashing Techniques for High Performance Computing
Hashing is a well-known and widely used technique for providing O(1) access to large files on secondary storage and tables in memory. Hashing techniques were introduced in the early 60s. The term hash function historically is used to denote a function that compresses a string of arbitrary input to a string of fixed length. Hashing finds applications in other fields such as fuzzy matching, error checking, authentication, cryptography, and networking. Hashing techniques have found application to provide faster access in routing tables, with the increase in the size of the routing tables. More recently, hashing has found applications in transactional memory in hardware. Motivated by these newly emerged applications of hashing, in this paper we present a survey of hashing techniques starting from traditional hashing methods with greater emphasis on the recent developments. We provide a brief explanation on hardware hashing and a brief introduction to transactional memory
Taking the Shortcut: Actively Incorporating the Virtual Memory Index of the OS to Hardware-Accelerate Database Indexing
Index structures often materialize one or multiple levels of explicit
indirections (aka pointers) to allow for a quick traversal to the data of
interest. Unfortunately, dereferencing a pointer to go from one level to the
other is costly since additionally to following the address, it involves two
address translations from virtual memory to physical memory under the hood. In
the worst case, such an address translation is resolved by an index access
itself, namely by a lookup into the page table, a central hardware-accelerated
index structure of the OS. However, if the page table is anyways constantly
queried, it raises the question whether we can actively incorporate it into our
database indexes and make it work for us. Precisely, instead of materializing
indirections in form of pointers, we propose to express these indirections
directly in the page table wherever possible. By introducing such shortcuts, we
(a) effectively reduce the height of traversal during lookups and (b) exploit
the hardware-acceleration of lookups in the page table. In this work, we
analyze the strengths and considerations of this approach and showcase its
effectiveness at the case of the real-world indexing scheme extendible hashing
Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks
Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There
have been numerous efforts to improve the performance of the data analytics frameworks in-
cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have
various target workloads and their own characteristicshowever, there is common ground as a
data analytics framework. Emerging hardware such as graphics processing units and persistent
memory is expected to open up new opportunities for such commonality. The goal of this dis-
sertation is to leverage emerging hardware to improve the performance of the data analytics
frameworks.
First, we design and implement EclipseMR, a novel MapReduce framework that efficiently
leverages an extensive amount of memory space distributed among the machines in a cluster.
EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer.
The in-memory cache layer is designed to store both local and remote data while balancing the
load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design
of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a
primary storage layer or cache layer, or it can adopt GPU to improve the performance of map
and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for
various applications.
Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis-
tent memory. The fundamental challenge to design a data structure for the persistent memory is
to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost.
B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits
from the strength of both in-memory index and block-based index, and CCEH is a variant of
extendible hashing that introduces an intermediate layer between directory and buckets to fully
benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data
structures show better performance than the corresponding state-of-the-art techniques.
Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack
(PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed
to improve the performance of the query by leveraging GPU as an acceleratorhowever, most
of the works focus on the brute-force algorithms. In this work, we overcome the challenges of
traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared
memory and runtime stack, irregular memory access pattern, and warp divergence problem.
Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force
algorithm and the traditional branch and bound algorithm.clos
Analysis and Comparison of Extendible Hashing and B+ Trees Access Methods
This thesis Is a discussion and evaluation of both extendible hashing and B+ tree. The study Includes a design and lmplementatlon under the UNIX system. Comparisons and analysis are made using empirical results.Computing and Information Scienc
Advance of the Access Methods
The goal of this paper is to outline the advance of the access methods in the last ten years as well as
to make review of all available in the accessible bibliography methods
Grid File Approach to Large Multidimensional Dynamic Data Structures
Computing and Information Science
- …