Search CORE

5 research outputs found

GPU-aided edge computing for processing the k nearest-neighbor query on SSD-resident data

Author: Corral Antonio
Vassilakopoulos Michael
Velentzas Polychronis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Edge computing aims at improving performance by storing and processing data closer to their source. The Nearest-Neighbor (-NN) query is a common spatial query in several applications. For example, this query can be used for distance classification of a group of points against a big reference dataset to derive the dominating feature class. Typically, GPU devices have much larger numbers of processing cores than CPUs and faster device memory than main memory accessed by CPUs, thus, providing higher computing power. However, since device and/or main memory may not be able to host an entire reference dataset, the use of secondary storage is inevitable. Solid State Disks (SSDs) could be used for storing such a dataset. In this paper, we propose an architecture of a distributed edge-computing environment where large-scale processing of the -NN query can be accomplished by executing an efficient algorithm for processing the -NN query on its (GPU and SSD enabled) edge nodes. We also propose a new algorithm for this purpose, a GPU-based partitioning algorithm for processing the -NN query on big reference data stored on SSDs. We implement this algorithm in a GPU-enabled edge-computing device, hosting reference data on an SSD. Using synthetic datasets, we present an extensive experimental performance comparison of the new algorithm against two existing ones (working on memory-resident data) proposed by other researchers and two existing ones (working on SSD-resident data) recently proposed by us. The new algorithm excels in all the conducted experiments and outperforms its competitors

Repositorio Institucional de la Universidad de Almería (Spain)

Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks

Author: Nam Moohyeon
Publication venue: Graduate School of UNIST
Publication date: 01/02/2019
Field of study

Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There have been numerous efforts to improve the performance of the data analytics frameworks in- cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have various target workloads and their own characteristicshowever, there is common ground as a data analytics framework. Emerging hardware such as graphics processing units and persistent memory is expected to open up new opportunities for such commonality. The goal of this dis- sertation is to leverage emerging hardware to improve the performance of the data analytics frameworks. First, we design and implement EclipseMR, a novel MapReduce framework that efficiently leverages an extensive amount of memory space distributed among the machines in a cluster. EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer. The in-memory cache layer is designed to store both local and remote data while balancing the load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a primary storage layer or cache layer, or it can adopt GPU to improve the performance of map and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for various applications. Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis- tent memory. The fundamental challenge to design a data structure for the persistent memory is to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost. B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits from the strength of both in-memory index and block-based index, and CCEH is a variant of extendible hashing that introduces an intermediate layer between directory and buckets to fully benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data structures show better performance than the corresponding state-of-the-art techniques. Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed to improve the performance of the query by leveraging GPU as an acceleratorhowever, most of the works focus on the brute-force algorithms. In this work, we overcome the challenges of traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared memory and runtime stack, irregular memory access pattern, and warp divergence problem. Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force algorithm and the traditional branch and bound algorithm.clos

ScholarWorks@UNIST

GPU-parallelisation of Haar wavelet-based grid resolution adaptation for fast finite volume modelling: application to shallow water flows

Author: Chowdhury A.A.
Kesserwani G.
Richmond P.
Rougé C.
Publication venue: IWA Publishing
Publication date: 23/06/2023
Field of study

Wavelet-based grid resolution adaptation driven by the ‘multiresolution analysis’ (MRA) of the Haar wavelet (HW) allows to devise an adaptive first-order finite volume (FV1) model (HWFV1) that can readily preserve the modelling fidelity of its reference uniform-grid FV1 counterpart. However, the MRA entails an enormous computational effort as it involves ‘encoding’ (coarsening), ‘decoding’ (refining), analysing and traversing modelled data across a deep hierarchy of nested, uniform grids. GPU-parallelisation of the MRA is needed to handle its computational effort, but its algorithmic structure (1) hinders coalesced memory access on the GPU and (2) involves an inherently sequential tree traversal problem. This work redesigns the algorithmic structure of the MRA in order to parallelise it on the GPU, addressing (1) by applying Z-order space-filling curves and (2) by adopting a parallel tree traversal algorithm. This results in a GPU-parallelised HWFV1 model (GPU-HWFV1). GPU-HWFV1 is verified against its CPU predecessor (CPU-HWFV1) and its GPU-parallelised reference uniform-grid counterpart (GPU-FV1) over five shallow water flow test cases. GPU-HWFV1 preserves the modelling fidelity of GPU-FV1 while being up to 30 times faster. Compared to CPU-HWFV1, it is up to 200 times faster, suggesting that the GPU-parallelised MRA could be used to speed up other FV1 models

ZENODO

White Rose Research Online

Parallel Tree Traversal for Nearest Neighbor Query on the GPU

Author: Kim Jinwoong
Nam Beomseok
Nam Moohyeon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/08/2016
Field of study

The similarity search problem is found in many application domains including computer graphics, information retrieval, statistics, computational biology, and scientific data processing just to name a few. Recently several studies have been performed to accelerate the k-nearest neighbor (kNN) queries using GPUs, but most of the works develop brute-force exhaustive scanning algorithms leveraging a large number of GPU cores and none of the prior works employ GPUs for an n-ary tree structured index. It is known that multi-dimensional hierarchical indexing trees such as R-trees are inherently not well suited for GPUs because of their irregular tree traversal and memory access patterns. Traversing hierarchical tree structures in an irregular manner makes it difficult to exploit parallelism since GPUs are tailored for deterministic memory accesses. In this work, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack (PSB), for kNN query processing on the GPU, this algorithm traverses a multi-dimensional tree structured index while avoiding warp divergence problems. In order to take advantage of accessing contiguous memory blocks, the proposed PSB algorithm performs linear scanning of sibling leaf nodes, which increases the chance to optimize the parallel SIMD algorithm. We evaluate the performance of the PSB algorithm against the classic branch-and-bound kNN query processing algorithm. Our experiments with real datasets show that the PSB algorithm is faster by a large margin than the branch-and-bound algorithm

ScholarWorks@UNIST