research

RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study

Abstract

The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN queries are of particular interest in a wide range of applications such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. With the current increasing volume of spatial data, it is difficult to perform RkNN queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources. In this paper, we investigate how to design and implement distributed RkNN query algorithms using shared-nothing spatial cloud infrastructures as SpatialHadoop and LocationSpark. SpatialHadoop is a framework that inherently supports spatial indexing on top of Hadoop to perform efficiently spatial queries. LocationSpark is a recent spatial data processing system built on top of Spark. We have evaluated the performance of the distributed RkNN query algorithms on both SpatialHadoop and LocationSpark with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in both distributed spatial data management systems, showing the performance advantages of LocationSpark

    Similar works