6,890 research outputs found
A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters
Keyword-based web queries with local intent retrieve web content that is
relevant to supplied keywords and that represent points of interest that are
near the query location. Two broad categories of such queries exist. The first
encompasses queries that retrieve single spatial web objects that each satisfy
the query arguments. Most proposals belong to this category. The second
category, to which this paper's proposal belongs, encompasses queries that
support exploratory user behavior and retrieve sets of objects that represent
regions of space that may be of interest to the user. Specifically, the paper
proposes a new type of query, namely the top-k spatial textual clusters (k-STC)
query that returns the top-k clusters that (i) are located the closest to a
given query location, (ii) contain the most relevant objects with regard to
given query keywords, and (iii) have an object density that exceeds a given
threshold. To compute this query, we propose a basic algorithm that relies on
on-line density-based clustering and exploits an early stop condition. To
improve the response time, we design an advanced approach that includes three
techniques: (i) an object skipping rule, (ii) spatially gridded posting lists,
and (iii) a fast range query algorithm. An empirical study on real data
demonstrates that the paper's proposals offer scalability and are capable of
excellent performance
Keyword-aware Optimal Route Search
Identifying a preferable route is an important problem that finds
applications in map services. When a user plans a trip within a city, the user
may want to find "a most popular route such that it passes by shopping mall,
restaurant, and pub, and the travel time to and from his hotel is within 4
hours." However, none of the algorithms in the existing work on route planning
can be used to answer such queries. Motivated by this, we define the problem of
keyword-aware optimal route query, denoted by KOR, which is to find an optimal
route such that it covers a set of user-specified keywords, a specified budget
constraint is satisfied, and an objective score of the route is optimal. The
problem of answering KOR queries is NP-hard. We devise an approximation
algorithm OSScaling with provable approximation bounds. Based on this
algorithm, another more efficient approximation algorithm BucketBound is
proposed. We also design a greedy approximation algorithm. Results of empirical
studies show that all the proposed algorithms are capable of answering KOR
queries efficiently, while the BucketBound and Greedy algorithms run faster.
The empirical studies also offer insight into the accuracy of the proposed
algorithms.Comment: VLDB201
Adaptive Processing of Spatial-Keyword Data Over a Distributed Streaming Cluster
The widespread use of GPS-enabled smartphones along with the popularity of
micro-blogging and social networking applications, e.g., Twitter and Facebook,
has resulted in the generation of huge streams of geo-tagged textual data. Many
applications require real-time processing of these streams. For example,
location-based e-coupon and ad-targeting systems enable advertisers to register
millions of ads to millions of users. The number of users is typically very
high and they are continuously moving, and the ads change frequently as well.
Hence sending the right ad to the matching users is very challenging. Existing
streaming systems are either centralized or are not spatial-keyword aware, and
cannot efficiently support the processing of rapidly arriving spatial-keyword
data streams. This paper presents Tornado, a distributed spatial-keyword stream
processing system. Tornado features routing units to fairly distribute the
workload, and furthermore, co-locate the data objects and the corresponding
queries at the same processing units. The routing units use the Augmented-Grid,
a novel structure that is equipped with an efficient search algorithm for
distributing the data objects and queries. Tornado uses evaluators to process
the data objects against the queries. The routing units minimize the redundant
communication by not sending data updates for processing when these updates do
not match any query. By applying dynamically evaluated cost formulae that
continuously represent the processing overhead at each evaluator, Tornado is
adaptive to changes in the workload. Extensive experimental evaluation using
spatio-textual range queries over real Twitter data indicates that Tornado
outperforms the non-spatio-textually aware approaches by up to two orders of
magnitude in terms of the overall system throughput
Geo-Social Group Queries with Minimum Acquaintance Constraint
The prosperity of location-based social networking services enables
geo-social group queries for group-based activity planning and marketing. This
paper proposes a new family of geo-social group queries with minimum
acquaintance constraint (GSGQs), which are more appealing than existing
geo-social group queries in terms of producing a cohesive group that guarantees
the worst-case acquaintance level. GSGQs, also specified with various spatial
constraints, are more complex than conventional spatial queries; particularly,
those with a strict NN spatial constraint are proved to be NP-hard. For
efficient processing of general GSGQ queries on large location-based social
networks, we devise two social-aware index structures, namely SaR-tree and
SaR*-tree. The latter features a novel clustering technique that considers both
spatial and social factors. Based on SaR-tree and SaR*-tree, efficient
algorithms are developed to process various GSGQs. Extensive experiments on
real-world Gowalla and Dianping datasets show that our proposed methods
substantially outperform the baseline algorithms based on R-tree.Comment: This is the preprint version that is accepted by the Very Large Data
Bases Journa
Efficient Spatial Keyword Search in Trajectory Databases
An increasing amount of trajectory data is being annotated with text
descriptions to better capture the semantics associated with locations. The
fusion of spatial locations and text descriptions in trajectories engenders a
new type of top- queries that take into account both aspects. Each
trajectory in consideration consists of a sequence of geo-spatial locations
associated with text descriptions. Given a user location and a
keyword set , a top- query returns trajectories whose text
descriptions cover the keywords and that have the shortest match
distance. To the best of our knowledge, previous research on querying
trajectory databases has focused on trajectory data without any text
description, and no existing work has studied such kind of top- queries on
trajectories. This paper proposes one novel method for efficiently computing
top- trajectories. The method is developed based on a new hybrid index,
cell-keyword conscious B-tree, denoted by \cellbtree, which enables us to
exploit both text relevance and location proximity to facilitate efficient and
effective query processing. The results of our extensive empirical studies with
an implementation of the proposed algorithms on BerkeleyDB demonstrate that our
proposed methods are capable of achieving excellent performance and good
scalability.Comment: 12 page
Reverse spatial visual top-k query
With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top- query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top- queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently
SVS-JOIN : efficient spatial visual similarity join for geo-multimedia
In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently
- …