31 research outputs found

    UV-Diagram: A Voronoi Diagram for Uncertain Spatial Databases

    Get PDF
    published_or_final_versio

    多次元データに対するランキング問合せ処理に関する研究

    Get PDF
    筑波大学 (University of Tsukuba)201

    Efficient query processing on spatial and textual data: beyond individual queries

    Get PDF
    With the increasing popularity of GPS enabled mobile devices, queries with locational intent are quickly becoming the most common type of search task on the web. This development has driven several research work on efficient processing of spatial and spatial-textual queries in the past few decades. While most of the existing work focus on answering queries independently, e.g., one query at a time, many real-life applications require the processing of multiple queries in a short period of time, and can benefit from sharing computations. This thesis focuses on efficient processing of the queries on spatial and spatial-textual data for the applications where multiple queries are of interest. Specifically, the following queries are studied: (i) batch processing of top-k spatial-textual queries; (ii) optimal location and keyword selection queries; and (iii) top-m rank aggregation on streaming spatial queries. The batch processing of queries is motivated from different application scenarios that require computing the result of multiple queries efficiently, including (i) multiple-query optimization, where the overall efficiency and throughput can be improved by grouping or partitioning a large set of queries; and (ii) continuous processing of a query stream, where in each time slot, the queries that have arrived can be processed together. In this thesis, given a set of top-k spatial-textual queries, the problem of computing the results for all the queries concurrently and efficiently as a batch is addressed. Some applications require an aggregation over the results of multiple queries. An exam- ple application is to identify the optimal value of attributes (e.g., location, text) for a new facility/service, so that the facility will appear in the query result of the maximum number of potential customers. This problem is essentially an aggregation (maximization) over the results of queries issued by multiple potential customers, where each user can be treated as a top-k query. In this thesis, we address this problem for spatial and textual data where the computations for multiple users are shared to find the final result. Rank aggregation is the problem of combining multiple rank orderings to produce a single ordering of the objects. Thus, aggregating the ranks of spatial objects can provide key insights into the importance of the objects in many different scenarios. This translates into a natural extension of the problem that finds the top-m objects with the highest aggregate rank over multiple queries. As the users issue new queries, clearly the rank aggregations continuously change over time, and recency also play an important role when interpreting the final results. The top-m rank aggregation of spatial objects for streaming queries is studied in this thesis, where the problem is to report the updated top-m objects with the highest aggregate rank over a subset of the most recent queries from a stream

    Efficient algorithms for optimal location queries in road networks

    Full text link

    空間Webデータにおけるm-最近接キーワード検索問題のトップダウン解法に関する研究

    Get PDF
    This thesis addresses the problem of m-closest keywords queries (mCK queries) over spatial web objects that contain descriptive texts and spatial information. The mCK query is a problem to find the optimal set of records in the sense that they are the spatially-closest records that satisfy m user-given keywords in their texts. The mCK query can be widely used in various applications to find the place of user’s interest. Generally, top-down search techniques using tree-style data structures are appropriate for finding optimal results of queries over spatial datasets. Thus in order to solve the mCK query problem, a previous study of NUS group assumed a specialized R*-tree (called bR*-tree) to store all records and proposed a top-down approach which uses an Apriori-based node-set enumeration in top-down process. However this assumption of prepared bR*-tree is not applicable to practical spatial web datasets, and the pruning ability of Apriori-based enumeration is highly dependent on the data distribution. In this thesis, we do not expect any prepared data-partitioning, but assume that we create a grid partitioning from necessary data only when an mCK query is given. Under this assumption, we propose a new search strategy termed Diameter Candidate Check (DCC), which can find a smaller node-set at an earlier stage of search so that it can reduce search space more efficiently. According to DCC search strategy, we firstly employ an implementation of DCC strategy in a nested loop search algorithm (called DCC-NL). Next, we improve the DCC-NL in a recursive way (called RDCC). RDCC can afford a more reasonable priority order of node-set enumeration. We also uses a tight lower bound to improve pruning ability in RDCC. RDCC performs well in a wide variey of data distributions, but it has still deficiency when one data-point has many query keywords and numerous node-sets are generated. Hence in order to avoid the generation of node-sets which is an unstable factor of search efficiency, we propose another different top-down search approach called Pairwise Expansion. Finally, we discuss some optimization techniques to enhance Pairwise Expansion approach. We first discuss the index structure in the Pairwise Expansion approach, and try to use an on-the-fly kd-tree to reduce building cost in the query process. Also a new lower bound and an upper bound are employed for more powerful pruning in Pairwise Expansion. We evaluate these approaches by using both real datasets and synthetic datasets for different data distributions, including 1.6 million of Flickr photo data. The result shows that DCC strategy can provide more stable search performance than the Apriori-based approach. And the Pairwise Expansion approach enhanced with lower/upper bounds, has more advantages over those algorithms having node-set generation, and is applicable for real spatial web data.電気通信大学201

    Managing moving objects and their trajectories

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Algorithms for continuous queries: A geometric approach

    Get PDF
    <p>There has been an unprecedented growth in both the amount of data and the number of users interested in different types of data. Users often want to keep track of the data that match their interests over a period of time. A continuous query, once issued by a user, maintains the matching results for the user as new data (as well as updates to the existing data) continue to arrive in a stream. However, supporting potentially millions of continuous queries is a huge challenge. This dissertation addresses the problem of scalably processing a large number of continuous queries over a wide-area network. </p><p>Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Since interacting with large-scale data can easily frustrate and overwhelm the users, top-k queries have attracted considerable interest from the database community as they allow users to focus on the top-ranked results only. However, it is nearly impossible to find a set of common top-ranked data that everyone is interested in, therefore, users are allowed to specify their interest in different forms of preferences, such as personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.</p><p>The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. Simple solutions include using a content-driven network to notify all continuous queries whose ranges contain the update (ignoring top-k), or using a server to compute only the affected continuous queries and notifying them individually. The former solution generates too much network traffic, while the latter overwhelms the server. This dissertation presents a geometric framework which allows the set of affected continuous queries to be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries. </p><p>The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users' interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm as well as a simple greedy algorithm. The Monte Carlo algorithm jointly considers multiple performance criteria to find a broker-subscriber assignment and provides theoretical performance guarantees. Using this algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.</p>Dissertatio
    corecore