15,911 research outputs found

    Efficient Spatial Keyword Search in Trajectory Databases

    Full text link
    An increasing amount of trajectory data is being annotated with text descriptions to better capture the semantics associated with locations. The fusion of spatial locations and text descriptions in trajectories engenders a new type of top-kk queries that take into account both aspects. Each trajectory in consideration consists of a sequence of geo-spatial locations associated with text descriptions. Given a user location λ\lambda and a keyword set ψ\psi, a top-kk query returns kk trajectories whose text descriptions cover the keywords ψ\psi and that have the shortest match distance. To the best of our knowledge, previous research on querying trajectory databases has focused on trajectory data without any text description, and no existing work has studied such kind of top-kk queries on trajectories. This paper proposes one novel method for efficiently computing top-kk trajectories. The method is developed based on a new hybrid index, cell-keyword conscious B+^+-tree, denoted by \cellbtree, which enables us to exploit both text relevance and location proximity to facilitate efficient and effective query processing. The results of our extensive empirical studies with an implementation of the proposed algorithms on BerkeleyDB demonstrate that our proposed methods are capable of achieving excellent performance and good scalability.Comment: 12 page

    A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters

    Full text link
    Keyword-based web queries with local intent retrieve web content that is relevant to supplied keywords and that represent points of interest that are near the query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, namely the top-k spatial textual clusters (k-STC) query that returns the top-k clusters that (i) are located the closest to a given query location, (ii) contain the most relevant objects with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a basic algorithm that relies on on-line density-based clustering and exploits an early stop condition. To improve the response time, we design an advanced approach that includes three techniques: (i) an object skipping rule, (ii) spatially gridded posting lists, and (iii) a fast range query algorithm. An empirical study on real data demonstrates that the paper's proposals offer scalability and are capable of excellent performance

    Geo-Social Group Queries with Minimum Acquaintance Constraint

    Full text link
    The prosperity of location-based social networking services enables geo-social group queries for group-based activity planning and marketing. This paper proposes a new family of geo-social group queries with minimum acquaintance constraint (GSGQs), which are more appealing than existing geo-social group queries in terms of producing a cohesive group that guarantees the worst-case acquaintance level. GSGQs, also specified with various spatial constraints, are more complex than conventional spatial queries; particularly, those with a strict kkNN spatial constraint are proved to be NP-hard. For efficient processing of general GSGQ queries on large location-based social networks, we devise two social-aware index structures, namely SaR-tree and SaR*-tree. The latter features a novel clustering technique that considers both spatial and social factors. Based on SaR-tree and SaR*-tree, efficient algorithms are developed to process various GSGQs. Extensive experiments on real-world Gowalla and Dianping datasets show that our proposed methods substantially outperform the baseline algorithms based on R-tree.Comment: This is the preprint version that is accepted by the Very Large Data Bases Journa

    The Flexible Group Spatial Keyword Query

    Full text link
    We present a new class of service for location based social networks, called the Flexible Group Spatial Keyword Query, which enables a group of users to collectively find a point of interest (POI) that optimizes an aggregate cost function combining both spatial distances and keyword similarities. In addition, our query service allows users to consider the tradeoffs between obtaining a sub-optimal solution for the entire group and obtaining an optimimized solution but only for a subgroup. We propose algorithms to process three variants of the query: (i) the group nearest neighbor with keywords query, which finds a POI that optimizes the aggregate cost function for the whole group of size n, (ii) the subgroup nearest neighbor with keywords query, which finds the optimal subgroup and a POI that optimizes the aggregate cost function for a given subgroup size m (m <= n), and (iii) the multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding POIs for each of the subgroup sizes in the range [m, n]. We design query processing algorithms based on branch-and-bound and best-first paradigms. Finally, we provide theoretical bounds and conduct extensive experiments with two real datasets which verify the effectiveness and efficiency of the proposed algorithms.Comment: 12 page

    URL Recommender using Parallel Processing

    Get PDF
    The main purpose of this project is to section similar news and articles from a vast variety of news articles. Let’s say, you want to read about latest news related to particular topic like sports. Usually, user goes to a particular website and goes through some news but he won’t be able to cover all the news coverage in a single website. So, he would be going through some other news website to checking it out and this continues. Also, some news websites might be containing some old news and the user might be going through that. To solve this, I have developed a web application where in user can see all the latest news from different websites in a single place. Users are given choice to select the news websites from which they want to view the latest news. The articles which we get from news websites are very random and we will be applying the DBSCAN algorithm and place the news articles in the cluster form for each specific topic for user to view. If the user wants to see sports, he will be provided with sports news section. And this process of extracting random news articles and forming news clusters are done at run time and at all times we will get the latest news as we will be extracting the data from web at run time. This is an effective way to watch all news at single place. And in turn this can be used as articles (URL) recommender as the user has to just go through the specific cluster which interests him and not visit all news websites to find articles. This way the user does not have to visit different sites to view all latest news. This idea can be expanded to not just news articles but also in other areas like collecting statistics of financial information etc. As the processing is done at runtime, the performance has to be improved. To improve the performance, the distributed data mining is used and multiple servers are being used which communicate with each other

    Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

    Full text link
    The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput

    Keyword-aware Optimal Route Search

    Full text link
    Identifying a preferable route is an important problem that finds applications in map services. When a user plans a trip within a city, the user may want to find "a most popular route such that it passes by shopping mall, restaurant, and pub, and the travel time to and from his hotel is within 4 hours." However, none of the algorithms in the existing work on route planning can be used to answer such queries. Motivated by this, we define the problem of keyword-aware optimal route query, denoted by KOR, which is to find an optimal route such that it covers a set of user-specified keywords, a specified budget constraint is satisfied, and an objective score of the route is optimal. The problem of answering KOR queries is NP-hard. We devise an approximation algorithm OSScaling with provable approximation bounds. Based on this algorithm, another more efficient approximation algorithm BucketBound is proposed. We also design a greedy approximation algorithm. Results of empirical studies show that all the proposed algorithms are capable of answering KOR queries efficiently, while the BucketBound and Greedy algorithms run faster. The empirical studies also offer insight into the accuracy of the proposed algorithms.Comment: VLDB201
    corecore