14 research outputs found

    Location- and keyword-based querying of geo-textual data: a survey

    Get PDF
    With the broad adoption of mobile devices, notably smartphones, keyword-based search for content has seen increasing use by mobile users, who are often interested in content related to their geographical location. We have also witnessed a proliferation of geo-textual content that encompasses both textual and geographical information. Examples include geo-tagged microblog posts, yellow pages, and web pages related to entities with physical locations. Over the past decade, substantial research has been conducted on integrating location into keyword-based querying of geo-textual content in settings where the underlying data is assumed to be either relatively static or is assumed to stream into a system that maintains a set of continuous queries. This paper offers a survey of both the research problems studied and the solutions proposed in these two settings. As such, it aims to offer the reader a first understanding of key concepts and techniques, and it serves as an “index” for researchers who are interested in exploring the concepts and techniques underlying proposed solutions to the querying of geo-textual data.Agency for Science, Technology and Research (A*STAR)Ministry of Education (MOE)Nanyang Technological UniversityThis research was supported in part by MOE Tier-2 Grant MOE2019-T2-2-181, MOE Tier-1 Grant RG114/19, an NTU ACE Grant, and the Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU), which is a collaboration between Singapore Telecommunications Limited (Singtel) and Nanyang Technological University (NTU) that is funded by the Singapore Government through the Industry Alignment Fund Industry Collaboration Projects Grant, and by the Innovation Fund Denmark centre, DIREC

    Reverse spatial visual top-k query

    Get PDF
    With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top- kk query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top- kk queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently

    Approximate Reverse Top-k Spatial-Keyword Queries

    Full text link
    Location-based services are becoming more involved with our daily lives, so many works have considered efficiently retrieving useful objects from spatial-keyword databases. These works are promising on the user sides, but none of them considers the service provider sides. To gain profits and enrich recommendation lists, service providers conduct market analyses and want to know potential users who may be interested in their services. In this paper, to satisfy this requirement, we propose a new query, approximate reverse top-k spatial-keyword (ART) query. Given a set O of spatial-keyword objects, a set S of users (their locations and preferable keywords), a query object q, k, and an approximation ratio ϵ, an ART query retrieves such users that q is included in their approximate top-k results among O and q. A straightforward approach to processing this query is to run a top-k spatial-keyword search for each user in S. This is clearly expensive, as the number of users is generally large. We therefore propose PART, an efficient algorithm for ART query processing. In addition, we propose B-PART, which enables the processing of multiple ART queries in a batch. We conduct extensive experiments using real datasets, and the results demonstrate the efficiencies of our algorithms.Nishio S., Amagata D., Hara T.. Approximate Reverse Top-k Spatial-Keyword Queries. Proceedings - IEEE International Conference on Mobile Data Management 2023-July, 96 (2023); https://doi.org/10.1109/MDM58254.2023.00026

    Efficient query processing on spatial and textual data: beyond individual queries

    Get PDF
    With the increasing popularity of GPS enabled mobile devices, queries with locational intent are quickly becoming the most common type of search task on the web. This development has driven several research work on efficient processing of spatial and spatial-textual queries in the past few decades. While most of the existing work focus on answering queries independently, e.g., one query at a time, many real-life applications require the processing of multiple queries in a short period of time, and can benefit from sharing computations. This thesis focuses on efficient processing of the queries on spatial and spatial-textual data for the applications where multiple queries are of interest. Specifically, the following queries are studied: (i) batch processing of top-k spatial-textual queries; (ii) optimal location and keyword selection queries; and (iii) top-m rank aggregation on streaming spatial queries. The batch processing of queries is motivated from different application scenarios that require computing the result of multiple queries efficiently, including (i) multiple-query optimization, where the overall efficiency and throughput can be improved by grouping or partitioning a large set of queries; and (ii) continuous processing of a query stream, where in each time slot, the queries that have arrived can be processed together. In this thesis, given a set of top-k spatial-textual queries, the problem of computing the results for all the queries concurrently and efficiently as a batch is addressed. Some applications require an aggregation over the results of multiple queries. An exam- ple application is to identify the optimal value of attributes (e.g., location, text) for a new facility/service, so that the facility will appear in the query result of the maximum number of potential customers. This problem is essentially an aggregation (maximization) over the results of queries issued by multiple potential customers, where each user can be treated as a top-k query. In this thesis, we address this problem for spatial and textual data where the computations for multiple users are shared to find the final result. Rank aggregation is the problem of combining multiple rank orderings to produce a single ordering of the objects. Thus, aggregating the ranks of spatial objects can provide key insights into the importance of the objects in many different scenarios. This translates into a natural extension of the problem that finds the top-m objects with the highest aggregate rank over multiple queries. As the users issue new queries, clearly the rank aggregations continuously change over time, and recency also play an important role when interpreting the final results. The top-m rank aggregation of spatial objects for streaming queries is studied in this thesis, where the problem is to report the updated top-m objects with the highest aggregate rank over a subset of the most recent queries from a stream

    多次元データに対するランキング問合せ処理に関する研究

    Get PDF
    筑波大学 (University of Tsukuba)201

    Reverse Thinking in Spatial Queries

    Full text link
    In recent years, an increasing number of researches are conducted on spatial queries regarding the influence of query objects. Among these queries, reverse k nearest neighbors (RkNN) query is the one studied the most extensively. Reverse k furthest neighbors (RkFN) queries is the natural complement of RkNN queries. RkNN query is introduced to reflect the influence of the query object. Since this representation is intuitive, RkNN query has attracted significant attention among the database community. Later, reverse top-k queries was introduced, and also used extensively to represent influence. In many scenarios, when we consider the influence of an spatial object, reverse thinking is involved. That is, whether an object is influential to another object is depending on how the other object assess this object, other than how this object considers the other object. In this thesis, we study three problems involves reverse thinking. We first study the problem of efficiently computing RkFN queries. We are the first to propose a solution for arbitrary value of k. Based on several interesting observations, we present an efficient algorithm to process the RkFN queries. We also present a rigorous theoretical analysis to study various important aspects of the problem and our algorithm. An extensive experimental study demonstrates that our algorithm outperforms the state-of-the-art algorithm even for k=1. The accuracy of our theoretical analysis is also verified. We then study the problem of selecting set of representative products considering both diversity and coverage based on reverse top-k queries. Since this problem is NP-hard, we employ a greedy algorithm. We adopt MinHash and KMV Synopses to assist set operations. Our experimental study demonstrates the performance of the proposed algorithm. We also study the problem of maximizing spatial influence of facility bundle based on RkNN queries. We are the first to study this problem. We prove its NP-hardness, and propose a branch-and-bound best first search algorithm that greedily select the currently best facility until we get the required number of facilities. We introduce the concept of kNN region. It allows us to avoid redundant calculation with dynamic programming technique. Experiments show that our algorithm is orders of magnitudes better than our baseline algorithm

    Execution and authentication of function queries

    Get PDF
    We introduce a new query primitive called Function Query (FQ). An FQ operates on a set of math functions and retrieves the functions whose output with a given input satisfies a query condition (e.g., being among top-k, within a given range). While FQ finds its natural uses in querying a database of math functions, it can also be applied on a database of discrete values. We show that by interpreting the database as a set of user-defined functions, FQ can retrieve the information like existing analytic queries such as top-k query and scalar product query and even more. Our research addresses the challenges of FQ execution and authentication. The former is how to minimize the computation and storage costs in processing an FQ, whereas the latter, how to verify that the result of an FQ returned by a potentially untrustworthy server is indeed correct. Our solutions are inspired from the observations that 1) the intersections of a set of continuous functions partition their domain into a number of subdomains, and 2) in each of these subdomains, the functions can be sorted based on their output. We prove the correctness of the proposed techniques and evaluate their performance through analysis, prototyping, and experiments using both synthetic and real-world data. In all settings, our techniques exhibit excellent performance. In addition to FQ, our research has developed another query primitive called Improvement Query, which we also include in this dissertation

    Algorithms for continuous queries: A geometric approach

    Get PDF
    <p>There has been an unprecedented growth in both the amount of data and the number of users interested in different types of data. Users often want to keep track of the data that match their interests over a period of time. A continuous query, once issued by a user, maintains the matching results for the user as new data (as well as updates to the existing data) continue to arrive in a stream. However, supporting potentially millions of continuous queries is a huge challenge. This dissertation addresses the problem of scalably processing a large number of continuous queries over a wide-area network. </p><p>Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Since interacting with large-scale data can easily frustrate and overwhelm the users, top-k queries have attracted considerable interest from the database community as they allow users to focus on the top-ranked results only. However, it is nearly impossible to find a set of common top-ranked data that everyone is interested in, therefore, users are allowed to specify their interest in different forms of preferences, such as personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.</p><p>The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. Simple solutions include using a content-driven network to notify all continuous queries whose ranges contain the update (ignoring top-k), or using a server to compute only the affected continuous queries and notifying them individually. The former solution generates too much network traffic, while the latter overwhelms the server. This dissertation presents a geometric framework which allows the set of affected continuous queries to be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries. </p><p>The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users' interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm as well as a simple greedy algorithm. The Monte Carlo algorithm jointly considers multiple performance criteria to find a broker-subscriber assignment and provides theoretical performance guarantees. Using this algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.</p>Dissertatio

    Efficient Processing of Ranking Queries in Novel Applications

    Get PDF
    Ranking queries, which return only a subset of results matching a user query, have been studied extensively in the past decade due to their importance in a wide range of applications. In this thesis, we study ranking queries in novel environments and settings where they have not been considered so far. With the advancements in sensor technologies, these small devices are today present in all corners of human life. Millions of them are deployed in various places and are sending data on a continuous basis. These sensors which before mainly monitored environmental phenomena or production chains, have now found their way into our daily lives as well; health monitoring being a plausible example of how much we rely on continuous observation of measurements. As the Web technology evolves and facilitates data stream transmissions, sensors do not remain the sole producers of data in form of streams. The Web 2.0 has escalated the production of user-generated content which appear in form of annotated posts in a Weblog (blog), pictures and videos, or small textual snippets reflecting the current activity or status of users and can be regarded as natural items of a temporal stream. A major part of this thesis is devoted to developing novel methods which assist in keeping track of this ever increasing flow of information with continuous monitoring of ranking queries over them, particularly when traditional approaches fail to meet the newly raised requirements. We consider the ranking problem when the information flow is not synchronized among its sources. This is a recurring situation, since sensors are run by different organizations, measure moving entities, or are simply represented by users which are inherently not synchronizable. Our methods are in particular designed for handling unsynchronized streams, calculating an object's score based on both its currently observed contribution to the registered queries as well as the contribution it might have in future. While this uncertainty in score calculation causes linear growth in the space necessary for providing exact results, we are able to define criteria which allows for evicting unpromising objects as early as possible. We also leverage statistical properties that reflect the correlation between multiple streams to predict the future to provide better bounds for the best possible contribution of an object, consequently limiting the necessary storage dramatically. To achieve this, we make use of small statistical synopses that are periodically refreshed during runtime. Furthermore, we consider user generated queries in the context of Web 2.0 applications which aim at filtering data streams in forms of textual documents, based on personal interests. In this case, the dimensionality of the data, the large cardinality of the subscribed queries, as well as the desire for consuming recent information, raise new challenges. We develop new approaches which efficiently filter the information and provide real-time updates to the user subscribed queries. Our methods rely on a novel ordering of user queries in traditional inverted lists which allows the system to effectively prune those queries for which a new piece of information is of no interest. Finally, we investigate high quality search in user generated content in Web 2.0 applications in form of images or videos. These resources are inherently dispersed all over the globe, therefore can be best managed in a purely distributed peer-to-peer network which eliminates single points of failure. Search in such a huge repository of high dimensional data involves evaluating ranking queries in form of nearest neighbor queries. Therefore, we study ranking queries in high dimensional spaces, where the index of the objects is maintained in a purely distributed fashion. Our solution meets the two major requirements of a viable solution in distributing the index and evaluating ranking queries: the underlying peer-to-peer network remains load balanced, and efficient query evaluation is feasible as similar objects are assigned to nearby peers
    corecore