158 research outputs found
Providing Diversity in K-Nearest Neighbor Query Results
Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN)
queries return the K closest answers according to given distance metric in the
database with respect to Q. In this scenario, it is possible that a majority of
the answers may be very similar to some other, especially when the data has
clusters. For a variety of applications, such homogeneous result sets may not
add value to the user. In this paper, we consider the problem of providing
diversity in the results of KNN queries, that is, to produce the closest result
set such that each answer is sufficiently different from the rest. We first
propose a user-tunable definition of diversity, and then present an algorithm,
called MOTLEY, for producing a diverse result set as per this definition.
Through a detailed experimental evaluation on real and synthetic data, we show
that MOTLEY can produce diverse result sets by reading only a small fraction of
the tuples in the database. Further, it imposes no additional overhead on the
evaluation of traditional KNN queries, thereby providing a seamless interface
between diversity and distance.Comment: 20 pages, 11 figure
Distance Range Queries in SpatialHadoop
Efficient processing of Distance Range Queries (DRQs) is of great importance in spatial databases due to the wide area of applications. This type of spatial query is characterized by a distance range over one or two datasets. The most representative and known DRQs are the ε Distance Range Query (εDRQ) and the ε Distance Range Join Query (εDRJQ). Given the increasing volume of spatial data, it is difficult to perform a DRQ on a centralized machine efficiently. Moreover, the εDRJQ is an expensive spatial operation, since it can be considered a combination of the εDR and the spatial join queries. For this reason, this paper addresses the problem of computing DRQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes new algorithms in SpatialHadoop to perform efficient parallel DRQs on large-scale spatial datasets. We have evaluated the performance of the proposed algorithms in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal
Location Selection Query in Google Maps using Voronoi-based Spatial Skyline (VS2) Algorithm
Google Maps is one of the popular location selection systems. One of the popular features of Google Maps is nearby search. For example, someone who wants to find the closest restaurants to his location can use the nearby search feature. This feature only considers one specific location in providing the desired place choice. In a real-world situation, there may be a need to consider more than one location in selecting the desired place. Assume someone would like to choose a hotel close to the conference hall, the museum, beach, and souvenir store. In this situation, nearby search feature in Google Maps may not be able to suggest a list of hotels that are interesting for him based on the distance from each destination places. In this paper, we have successfully developed a web-based application of Google Maps search using Voronoi-based Spatial Skyline (VS2) algorithm to choose some Point Of Interest (POI) from Google Maps as their considered locations to select desired place. We used Google Maps API to provide POI information for our web-based application. The experiment result showed that the execution time increases while the number of considered location increases
Enhancing SpatialHadoop with Closest Pair Queries
Given two datasets P and Q, the K Closest Pair Query (KCPQ) finds the K closest pairs of objects from P ×Q. It is an operation widely adopted by many spatial and GIS applications. As a combination of the K Nearest Neighbor (KNN) and the spatial join queries, KCPQ is an expensive operation. Given the increasing volume of spatial data, it is difficult to perform a KCPQ on a centralized machine efficiently. For this reason, this paper addresses the problem of computing the KCPQ on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes a novel algorithm in SpatialHadoop to perform efficient parallel KCPQ on large-scale spatial datasets. We have evaluated the performance of the algorithm in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal
Discovering Attractive Products based on Influence Sets
Skyline queries have been widely used as a practical tool for multi-criteria
decision analysis and for applications involving preference queries. For
example, in a typical online retail application, skyline queries can help
customers select the most interesting, among a pool of available, products.
Recently, reverse skyline queries have been proposed, highlighting the
manufacturer's perspective, i.e. how to determine the expected buyers of a
given product. In this work we develop novel algorithms for two important
classes of queries involving customer preferences. We first propose a novel
algorithm, termed as RSA, for answering reverse skyline queries. We then
introduce a new type of queries, namely the k-Most Attractive Candidates k-MAC
query. In this type of queries, given a set of existing product specifications
P, a set of customer preferences C and a set of new candidate products Q, the
k-MAC query returns the set of k candidate products from Q that jointly
maximizes the total number of expected buyers, measured as the cardinality of
the union of individual reverse skyline sets (i.e., influence sets). Applying
existing approaches to solve this problem would require calculating the reverse
skyline set for each candidate, which is prohibitively expensive for large data
sets. We, thus, propose a batched algorithm for this problem and compare its
performance against a branch-and-bound variant that we devise. Both of these
algorithms use in their core variants of our RSA algorithm. Our experimental
study using both synthetic and real data sets demonstrates that our proposed
algorithms outperform existing, or naive solutions to our studied classes of
queries
Efficient Large-scale Distance-Based Join Queries in SpatialHadoop
Efficient processing of Distance-Based Join Queries (DBJQs) in spatial databases is of paramount importance in many application domains. The most representative and known DBJQs are the K Closest Pairs Query (KCPQ) and the ε Distance Join Query (εDJQ). These types of join queries are characterized by a number of desired pairs (K) or a distance threshold (ε) between the components of the pairs in the final result, over two spatial datasets. Both are expensive operations, since two spatial datasets are combined with additional constraints. Given the increasing volume of spatial data originating from multiple sources and stored in distributed servers, it is not always efficient to perform DBJQs on a centralized server. For this reason, this paper addresses the problem of computing DBJQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports efficient processing of spatial queries in a cloud-based setting. We propose novel algorithms, based on plane-sweep, to perform efficient parallel DBJQs on large-scale spatial datasets in Spatial Hadoop. We evaluate the performance of the proposed algorithms in several situations with large real-world as well as synthetic datasets. The experiments demonstrate the efficiency and scalability of our proposed methodologies
- …