5 research outputs found

    Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks

    Get PDF
    Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There have been numerous efforts to improve the performance of the data analytics frameworks in- cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have various target workloads and their own characteristicshowever, there is common ground as a data analytics framework. Emerging hardware such as graphics processing units and persistent memory is expected to open up new opportunities for such commonality. The goal of this dis- sertation is to leverage emerging hardware to improve the performance of the data analytics frameworks. First, we design and implement EclipseMR, a novel MapReduce framework that efficiently leverages an extensive amount of memory space distributed among the machines in a cluster. EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer. The in-memory cache layer is designed to store both local and remote data while balancing the load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a primary storage layer or cache layer, or it can adopt GPU to improve the performance of map and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for various applications. Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis- tent memory. The fundamental challenge to design a data structure for the persistent memory is to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost. B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits from the strength of both in-memory index and block-based index, and CCEH is a variant of extendible hashing that introduces an intermediate layer between directory and buckets to fully benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data structures show better performance than the corresponding state-of-the-art techniques. Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed to improve the performance of the query by leveraging GPU as an acceleratorhowever, most of the works focus on the brute-force algorithms. In this work, we overcome the challenges of traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared memory and runtime stack, irregular memory access pattern, and warp divergence problem. Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force algorithm and the traditional branch and bound algorithm.clos

    Parallel Tree Traversal for Nearest Neighbor Query on the GPU

    No full text
    The similarity search problem is found in many application domains including computer graphics, information retrieval, statistics, computational biology, and scientific data processing just to name a few. Recently several studies have been performed to accelerate the k-nearest neighbor (kNN) queries using GPUs, but most of the works develop brute-force exhaustive scanning algorithms leveraging a large number of GPU cores and none of the prior works employ GPUs for an n-ary tree structured index. It is known that multi-dimensional hierarchical indexing trees such as R-trees are inherently not well suited for GPUs because of their irregular tree traversal and memory access patterns. Traversing hierarchical tree structures in an irregular manner makes it difficult to exploit parallelism since GPUs are tailored for deterministic memory accesses. In this work, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for kNN query processing on the GPU, this algorithm traverses a multi-dimensional tree structured index while avoiding warp divergence problems. In order to take advantage of accessing contiguous memory blocks, the proposed PSB algorithm performs linear scanning of sibling leaf nodes, which increases the chance to optimize the parallel SIMD algorithm. We evaluate the performance of the PSB algorithm against the classic branch-and-bound kNN query processing algorithm. Our experiments with real datasets show that the PSB algorithm is faster by a large margin than the branch-and-bound algorithm

    Write-Optimized Dynamic Hashing for Persistent Memory

    No full text
    Low latency storage media such as byte-addressable persistent memory (PM) requires rethinking of various data structures in terms of optimization. One of the main challenges in implementing hash-based indexing structures on PM is how to achieve efficiency by making effective use of cachelines while guaranteeing failure-atomicity for dynamic hash expansion and shrinkage. In this paper, we present Cacheline-Conscious Extendible Hashing (CCEH) that reduces the overhead of dynamic memory block management while guaranteeing constant hash table lookup time. CCEH guarantees failure-atomicity without making use of explicit logging. Our experiments show that CCEH effectively adapts its size as the demand increases under the finegrained ailure-atomicity constraint and its maximum query latency is an order of magnitude lower compared to the stateof-the-art ashing techniques

    EclipseMR: Distributed and Parallel Task Processing with Consistent Hashing

    No full text
    We present EclipseMR, a novel MapReduce framework prototype that efficiently utilizes a large distributed memory in cluster environments. EclipseMR consists of double-layered consistent hash rings - a decentralized DHT-based file system and an in-memory key-value store that employs consistent hashing. The in-memory key-value store in EclipseMR is designed not only to cache local data but also remote data as well so that globally popular data can be distributed across cluster servers and found by consistent hashing. In order to leverage large distributed memories and increase the cache hit ratio, we propose a locality-aware fair (LAF) job scheduler that works as the load balancer for the distributed in-memory caches. Based on hash keys, the LAF job scheduler predicts which servers have reusable data, and assigns tasks to the servers so that they can be reused. The LAF job scheduler makes its best efforts to strike a balance between data locality and load balance, which often conflict with each other. We evaluate EclipseMR by quantifying the performance effect of each component using several representative MapReduce applications and show EclipseMR is faster than Hadoop and Spark by a large margin for various applications

    B-3-Tree: Byte-Addressable Binary B-Tree for Persistent Memory

    No full text
    In this work, we propose B-3-tree, a hybrid index for persistent memory that leverages the byte-addressability of the in-memory index and the page locality of B-trees. As in the byte-addressable in-memory index, B-3-tree is updated by 8-byte store instructions. Also, as in disk-based index, B-3-tree is failure-atomic since it makes every 8-byte store instruction transform a consistent index into another consistent index without the help of expensive logging. Since expensive logging becomes unnecessary, the number of cacheline flush instructions required for B-3-tree is significantly reduced. Our performance study shows that B-3-tree outperforms other state-of-the-art persistent indexes in terms of insert and delete performance. While B-3-tree shows slightly worse performance for point query performance, the range query performance of B-3-tree is 2x faster than FAST and FAIR B-tree because the leaf page size of B-3-tree can be set to 8x larger than that of FAST and FAIR B-tree without degrading insertion performance. We also show that read transactions can access B-3- tree without acquiring a shared lock because B-3-tree remains always consistent while a sequence of 8-byte write operations are making changes to it. As a result, B-3-tree provides high concurrency level comparable to FAST and FAIR B-tree
    corecore