3 research outputs found

    Efficient query processing on spatial and textual data: beyond individual queries

    Get PDF
    With the increasing popularity of GPS enabled mobile devices, queries with locational intent are quickly becoming the most common type of search task on the web. This development has driven several research work on efficient processing of spatial and spatial-textual queries in the past few decades. While most of the existing work focus on answering queries independently, e.g., one query at a time, many real-life applications require the processing of multiple queries in a short period of time, and can benefit from sharing computations. This thesis focuses on efficient processing of the queries on spatial and spatial-textual data for the applications where multiple queries are of interest. Specifically, the following queries are studied: (i) batch processing of top-k spatial-textual queries; (ii) optimal location and keyword selection queries; and (iii) top-m rank aggregation on streaming spatial queries. The batch processing of queries is motivated from different application scenarios that require computing the result of multiple queries efficiently, including (i) multiple-query optimization, where the overall efficiency and throughput can be improved by grouping or partitioning a large set of queries; and (ii) continuous processing of a query stream, where in each time slot, the queries that have arrived can be processed together. In this thesis, given a set of top-k spatial-textual queries, the problem of computing the results for all the queries concurrently and efficiently as a batch is addressed. Some applications require an aggregation over the results of multiple queries. An exam- ple application is to identify the optimal value of attributes (e.g., location, text) for a new facility/service, so that the facility will appear in the query result of the maximum number of potential customers. This problem is essentially an aggregation (maximization) over the results of queries issued by multiple potential customers, where each user can be treated as a top-k query. In this thesis, we address this problem for spatial and textual data where the computations for multiple users are shared to find the final result. Rank aggregation is the problem of combining multiple rank orderings to produce a single ordering of the objects. Thus, aggregating the ranks of spatial objects can provide key insights into the importance of the objects in many different scenarios. This translates into a natural extension of the problem that finds the top-m objects with the highest aggregate rank over multiple queries. As the users issue new queries, clearly the rank aggregations continuously change over time, and recency also play an important role when interpreting the final results. The top-m rank aggregation of spatial objects for streaming queries is studied in this thesis, where the problem is to report the updated top-m objects with the highest aggregate rank over a subset of the most recent queries from a stream

    MaxBRkNN Queries for Streaming Geo-Data

    No full text
    The problem of maximizing bichromatic reverse k nearest neighbor queries (MaxBR k NN) has been extensively studied in spatial databases, where given a set of facilities and a set of customers, a MaxBR k NN query returns a region to establish a new facility p such that p is a k NN of the maximum number of customers. In the literature, current solutions for MaxBR k NN queries are predominantly static. However, there are numerous applications for dynamic variations of these queries, including advertisements and resource reallocation based on streaming customer locations via social media check-ins, or GPS location updates from mobile devices. In this paper, we address the problem of continuous MaxBR k NN queries for streaming objects (customers). As customer data can arrive at a very high rate, we adopt two different models for recency information (sliding windows and micro-batching). We propose an efficient solution where results are incrementally updated by reusing computations from the previous result. We present a safe interval to reduce the number of computations for the new objects, and prune the objects that cannot affect the result. We perform extensive experiments on datasets integrated from four different real-life data sources, and demonstrate the efficiency of our solution by rigorously comparing how different properties of the datasets can affect the performance
    corecore