RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

A Aji; A Eldawy; C Ji; F García-García; F Li; H Zhang; M Tang; S Yang; W Wu

research

RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

Authors: A Aji
A Eldawy
C Ji
F García-García
F Li
H Zhang
M Tang
S Yang
W Wu
Publication date: 1 January 2017
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark