264 research outputs found
DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search
Kernel Density Estimation (KDE) is a nonparametric method for estimating the
shape of a density function, given a set of samples from the distribution.
Recently, locality-sensitive hashing, originally proposed as a tool for nearest
neighbor search, has been shown to enable fast KDE data structures. However,
these approaches do not take advantage of the many other advances that have
been made in algorithms for nearest neighbor algorithms. We present an
algorithm called Density Estimation from Approximate Nearest Neighbors (DEANN)
where we apply Approximate Nearest Neighbor (ANN) algorithms as a black box
subroutine to compute an unbiased KDE. The idea is to find points that have a
large contribution to the KDE using ANN, compute their contribution exactly,
and approximate the remainder with Random Sampling (RS). We present a
theoretical argument that supports the idea that an ANN subroutine can speed up
the evaluation. Furthermore, we provide a C++ implementation with a Python
interface that can make use of an arbitrary ANN implementation as a subroutine
for KDE evaluation. We show empirically that our implementation outperforms
state of the art implementations in all high dimensional datasets we
considered, and matches the performance of RS in cases where the ANN yield no
gains in performance.Comment: 24 pages, 1 figure. Submitted for revie
Drawbacks and Proposed Solutions for Real-time Processing on Existing State-of-the-art Locality Sensitive Hashing Techniques
Nearest-neighbor query processing is a fundamental operation for many image
retrieval applications. Often, images are stored and represented by
high-dimensional vectors that are generated by feature-extraction algorithms.
Since tree-based index structures are shown to be ineffective for high
dimensional processing due to the well-known "Curse of Dimensionality",
approximate nearest neighbor techniques are used for faster query processing.
Locality Sensitive Hashing (LSH) is a very popular and efficient approximate
nearest neighbor technique that is known for its sublinear query processing
complexity and theoretical guarantees. Nowadays, with the emergence of
technology, several diverse application domains require real-time
high-dimensional data storing and processing capacity. Existing LSH techniques
are not suitable to handle real-time data and queries. In this paper, we
discuss the challenges and drawbacks of existing LSH techniques for processing
real-time high-dimensional image data. Additionally, through experimental
analysis, we propose improvements for existing state-of-the-art LSH techniques
for efficient processing of high-dimensional image data.Comment: Accepted and Presented at the 5th International Conference on Signal
and Image Processing (SIGI-2019), Dubai, UA
GPUs as Storage System Accelerators
Massively multicore processors, such as Graphics Processing Units (GPUs),
provide, at a comparable price, a one order of magnitude higher peak
performance than traditional CPUs. This drop in the cost of computation, as any
order-of-magnitude drop in the cost per unit of performance for a class of
system components, triggers the opportunity to redesign systems and to explore
new ways to engineer them to recalibrate the cost-to-performance relation. This
project explores the feasibility of harnessing GPUs' computational power to
improve the performance, reliability, or security of distributed storage
systems. In this context, we present the design of a storage system prototype
that uses GPU offloading to accelerate a number of computationally intensive
primitives based on hashing, and introduce techniques to efficiently leverage
the processing power of GPUs. We evaluate the performance of this prototype
under two configurations: as a content addressable storage system that
facilitates online similarity detection between successive versions of the same
file and as a traditional system that uses hashing to preserve data integrity.
Further, we evaluate the impact of offloading to the GPU on competing
applications' performance. Our results show that this technique can bring
tangible performance gains without negatively impacting the performance of
concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201
Leveraging Emerging Hardware to Improve the Performance of Data Analytics Frameworks
Department of Computer Science and EngineeringThe data analytics frameworks have evolved along with the growing amount of data. There
have been numerous efforts to improve the performance of the data analytics frameworks in-
cluding MapReduce frameworks and NoSQL and NewSQL databases. These frameworks have
various target workloads and their own characteristicshowever, there is common ground as a
data analytics framework. Emerging hardware such as graphics processing units and persistent
memory is expected to open up new opportunities for such commonality. The goal of this dis-
sertation is to leverage emerging hardware to improve the performance of the data analytics
frameworks.
First, we design and implement EclipseMR, a novel MapReduce framework that efficiently
leverages an extensive amount of memory space distributed among the machines in a cluster.
EclipseMR consists of a decentralized DHT-based file system layer and an in-memory cache layer.
The in-memory cache layer is designed to store both local and remote data while balancing the
load between the servers with proposed Locality-Aware Fair (LAF) job scheduler. The design
of EclipseMR is easily extensible with emerging hardwareit can adopt persistent memory as a
primary storage layer or cache layer, or it can adopt GPU to improve the performance of map
and reduce functions. Our evaluation shows that EclipseMR outperforms Hadoop and Spark for
various applications.
Second, we propose B 3 -tree and Cache-Conscious Extendible Hashing (CCEH) for the persis-
tent memory. The fundamental challenge to design a data structure for the persistent memory is
to guarantee consistent transition with 8-bytes of fine-grained atomic write with minimum cost.
B 3 -tree is a fully persistent hybrid indexing structure of binary tree and B+-tree that benefits
from the strength of both in-memory index and block-based index, and CCEH is a variant of
extendible hashing that introduces an intermediate layer between directory and buckets to fully
benefit from a cache-sized bucket while minimizing the size of the directory. Both of the data
structures show better performance than the corresponding state-of-the-art techniques.
Third, we develop a data parallel tree traversal algorithm, Parallel Scan and Backtrack
(PSB), for k-nearest neighbor search problem on the GPU. Several studies have been proposed
to improve the performance of the query by leveraging GPU as an acceleratorhowever, most
of the works focus on the brute-force algorithms. In this work, we overcome the challenges of
traversing multi-dimensional hierarchical indexing structure on the GPU such as tiny shared
memory and runtime stack, irregular memory access pattern, and warp divergence problem.
Our evaluation shows that our data parallel PSB algorithm outperforms both the brute-force
algorithm and the traditional branch and bound algorithm.clos
- …