5 research outputs found

    Efficient query processing on spatial and textual data: beyond individual queries

    Get PDF
    With the increasing popularity of GPS enabled mobile devices, queries with locational intent are quickly becoming the most common type of search task on the web. This development has driven several research work on efficient processing of spatial and spatial-textual queries in the past few decades. While most of the existing work focus on answering queries independently, e.g., one query at a time, many real-life applications require the processing of multiple queries in a short period of time, and can benefit from sharing computations. This thesis focuses on efficient processing of the queries on spatial and spatial-textual data for the applications where multiple queries are of interest. Specifically, the following queries are studied: (i) batch processing of top-k spatial-textual queries; (ii) optimal location and keyword selection queries; and (iii) top-m rank aggregation on streaming spatial queries. The batch processing of queries is motivated from different application scenarios that require computing the result of multiple queries efficiently, including (i) multiple-query optimization, where the overall efficiency and throughput can be improved by grouping or partitioning a large set of queries; and (ii) continuous processing of a query stream, where in each time slot, the queries that have arrived can be processed together. In this thesis, given a set of top-k spatial-textual queries, the problem of computing the results for all the queries concurrently and efficiently as a batch is addressed. Some applications require an aggregation over the results of multiple queries. An exam- ple application is to identify the optimal value of attributes (e.g., location, text) for a new facility/service, so that the facility will appear in the query result of the maximum number of potential customers. This problem is essentially an aggregation (maximization) over the results of queries issued by multiple potential customers, where each user can be treated as a top-k query. In this thesis, we address this problem for spatial and textual data where the computations for multiple users are shared to find the final result. Rank aggregation is the problem of combining multiple rank orderings to produce a single ordering of the objects. Thus, aggregating the ranks of spatial objects can provide key insights into the importance of the objects in many different scenarios. This translates into a natural extension of the problem that finds the top-m objects with the highest aggregate rank over multiple queries. As the users issue new queries, clearly the rank aggregations continuously change over time, and recency also play an important role when interpreting the final results. The top-m rank aggregation of spatial objects for streaming queries is studied in this thesis, where the problem is to report the updated top-m objects with the highest aggregate rank over a subset of the most recent queries from a stream

    Batch processing of top-K spatial-textual queries

    No full text
    Top-k spatial-textual queries have received significant attention in the research community. Several techniques to efficiently process this class of queries are now widely used in a variety of applications. However, the problem of how best to process multiple queries efficiently is not well understood. Applications relying on processing continuous streams of queries, and offline pre-processing of other queries could benefit from solutions to this problem. In this work, we study practical solutions to efficiently process a set of top-k spatial-textual queries. We propose an efficient best-first algorithm for the batch processing of top-k spatial-textual queries that promotes shared processing and reduced I/O in each query batch. By grouping similar queries and processing them simultaneously, we are able to demonstrate significant performance gains using publicly available datasets

    Efficient Data Modelling, Indexing and Processing in Large Datasets

    Full text link
    Many devices and applications in social networks and on-line services are producing, storing, and using description, location, and occurrence time of objects. There are various systems to study, model, index, and process a huge amount of data. In this thesis, we study graphs and publish/subscribe systems. Firstly, we study the problem of continuously updating top-k messages with the highest ranks, each of which contains all the requested keywords when the rank of a message calculates based on freshness and distance to query’s location. Since new incoming messages are arriving all the time and the score of existing top-k results are decreasing over time, providing the most recent information needs continuously computing and maintaining the best results. We propose an efficient indexing and matching method using keywords, location, and the most recent top-k results of queries. Secondly, we study the problem of the decomposition of (k,s)-core. As both the user engagement of nodes and the strength of relationships are important, the (k, s)-core model is proposed in the literature to discover strong communities. Nevertheless, the decomposition algorithm regarding (k,s)-core is not yet investigated. We propose (k,s)-core algorithms to decompose a graph into its hierarchical structures considering both user engagement and tie strength. We first present the basic (k,s)-core decomposition methods. Then, we propose the advanced algorithms DES and DEK which index the support of edges to enable higher-level cost-sharing in the peeling process. In addition, effective pruning strategies are applied to DES/DEK to further enhance performance. Moreover, we build a novel index based on the decomposition result and investigate efficient (k,s)-core query algorithm based on our index. Finally, we develop efficient algorithm for maintaining the (k, s)-core index of the dynamic graph where vertices and edges are inserted and deleted. The algorithm, uses pruning strategies by exploiting the lower and upper bounds of the core number. We define a new Smax core and develop an efficient method for updating (k,s) numbers of nodes

    Batch processing of Top-k Spatial-textual Queries

    No full text
    Since the mid-2000s, everal indexing techniques have been proposed to efficiently answer top-k spatial-textual queries. However, all of these approaches focus on answering one query at a time. In contrast, how to design efficient algorithms that can exploit similarities between incoming queries to improve performance has received little attention. In this article, we study a series of efficient approaches to batch process multiple top-k spatial-textual queries concurrently. We carefully design a variety of indexing structures for the problem space by exploring the effect of prioritizing spatial and textual properties on system performance. Specifically, we present an efficient traversal method, SF-Sep, over an existing space-prioritized index structure. Then, we propose a new space-prioritized index structure, the MIR-Tree to support a filter-and-refine based technique, SF-Grp. To support the processing of text-intensive data, we propose an augmented, inverted indexing structure that can easily be added into existing text search engine architectures and a novel traversal method for batch processing of the queries. In all of these approaches, the goal is to improve the overall performance by sharing the I/O costs of similar queries. Finally, we demonstrate significant I/O savings in our algorithms over traditional approaches by extensive experiments on three real datasets and compare how properties of different datasets affect the performance. Many applications in streaming, micro-batching of continuous queries, and privacy-aware search can benefit from this line of work