15,832 research outputs found

    Report on prototype implementation of Pausanias

    Get PDF
    Classic web search engines, such as Google and Yahoo, provide efficient algorithms for retrieval of ranked search results given a set of keywords. However, there are also applications that provide the user the opportunity to search using geotagged criteria. These applications are Google Maps, Bing Maps etc. Such applications depict points of interest on the map and combine their location with the keywords provided by the associated document(s). These queries are so called spatial queries. Although in many cases spatial constraints seem to suffice, sometimes there is a need to combine spatial and textual information to find points of interests. These queries are called Spatio-Textual Queries. There is an active research interest in spatio-textual queries and more specifically in the efficient retrieval of topK Spatio-Textual Queries. This project employs 5 different queries ranging from simple Boolean Spatial Range Queries to TopK Spatio-Textual Range Queries. In this way the user will be able to choose among 5 queries to satisfy his/her constraints. In the following, there will be an explanation of the data structure and the algorithms that Pausanias employed

    Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster

    Full text link
    The widespread use of GPS-enabled smartphones along with the popularity of micro-blogging and social networking applications, e.g., Twitter and Facebook, has resulted in the generation of huge streams of geo-tagged textual data. Many applications require real-time processing of these streams. For example, location-based e-coupon and ad-targeting systems enable advertisers to register millions of ads to millions of users. The number of users is typically very high and they are continuously moving, and the ads change frequently as well. Hence sending the right ad to the matching users is very challenging. Existing streaming systems are either centralized or are not spatial-keyword aware, and cannot efficiently support the processing of rapidly arriving spatial-keyword data streams. This paper presents Tornado, a distributed spatial-keyword stream processing system. Tornado features routing units to fairly distribute the workload, and furthermore, co-locate the data objects and the corresponding queries at the same processing units. The routing units use the Augmented-Grid, a novel structure that is equipped with an efficient search algorithm for distributing the data objects and queries. Tornado uses evaluators to process the data objects against the queries. The routing units minimize the redundant communication by not sending data updates for processing when these updates do not match any query. By applying dynamically evaluated cost formulae that continuously represent the processing overhead at each evaluator, Tornado is adaptive to changes in the workload. Extensive experimental evaluation using spatio-textual range queries over real Twitter data indicates that Tornado outperforms the non-spatio-textually aware approaches by up to two orders of magnitude in terms of the overall system throughput

    Selectivity estimation on streaming spatio-textual data using local correlations

    Full text link
    In this paper, we investigate the selectivity estimation prob- lem for streaming spatio-textual data, which arises in many social network and geo-location applications. Specifically, given a set of continuously and rapidly arriving spatio- textual objects, each of which is described by a geo-location and a short text, we aim to accurately estimate the cardinal- ity of a spatial keyword query on objects seen so far, where a spatial keyword query consists of a search region and a set of query keywords. To the best of our knowledge, this is the first work to ad- dress this important problem. We first extend two existing techniques to solve this problem, and show their limitations. Inspired by two key observations on the "locality" of the correlations among query keywords, we propose a local cor- relation based method by utilizing an augmented adaptive space partition tree (A2SP-tree for short) to approximately learn a local Bayesian network on-the-fly for a given query and estimate its selectivity. A novel local boosting approach is presented to further enhance the learning accuracy of lo- cal Bayesian networks. Our comprehensive experiments on real-life datasets demonstrate the superior performance of the local correlation based algorithm in terms of estimation accuracy compared to other competitors. © 2014 VLDB Endowment 21508097/ 14/10

    Spatio-textual indexing for geographical search on the web

    Get PDF
    Many web documents refer to specific geographic localities and many people include geographic context in queries to web search engines. Standard web search engines treat the geographical terms in the same way as other terms. This can result in failure to find relevant documents that refer to the place of interest using alternative related names, such as those of included or nearby places. This can be overcome by associating text indexing with spatial indexing methods that exploit geo-tagging procedures to categorise documents with respect to geographic space. We describe three methods for spatio-textual indexing based on multiple spatially indexed text indexes, attaching spatial indexes to the document occurrences of a text index, and merging text index access results with results of access to a spatial index of documents. These schemes are compared experimentally with a conventional text index search engine, using a collection of geo-tagged web documents, and are shown to be able to compete in speed and storage performance with pure text indexing

    Event detection in location-based social networks

    Get PDF
    With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft

    Objects2action: Classifying and localizing actions without any video example

    Get PDF
    The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach

    Contextual Media Retrieval Using Natural Language Queries

    Full text link
    The widespread integration of cameras in hand-held and head-worn devices as well as the ability to share content online enables a large and diverse visual capture of the world that millions of users build up collectively every day. We envision these images as well as associated meta information, such as GPS coordinates and timestamps, to form a collective visual memory that can be queried while automatically taking the ever-changing context of mobile users into account. As a first step towards this vision, in this work we present Xplore-M-Ego: a novel media retrieval system that allows users to query a dynamic database of images and videos using spatio-temporal natural language queries. We evaluate our system using a new dataset of real user queries as well as through a usability study. One key finding is that there is a considerable amount of inter-user variability, for example in the resolution of spatial relations in natural language utterances. We show that our retrieval system can cope with this variability using personalisation through an online learning-based retrieval formulation.Comment: 8 pages, 9 figures, 1 tabl
    corecore