28,136 research outputs found
Accelerating Spatio-Textual Queries with Learned Indices
Efficiently computing spatio-textual queries has become increasingly
important in various applications that need to quickly retrieve geolocated
entities associated with textual information, such as in location-based
services and social networks. To accelerate such queries, several works have
proposed combining spatial and textual indices into hybrid index structures.
Recently, the novel idea of replacing traditional indices with ML models has
attracted a lot of attention. This includes works on learned spatial indices,
where the main challenge is to address the lack of a total ordering among
objects in a multidimensional space. In this work, we investigate how to extend
this novel type of index design to the case of spatio-textual data. We study
different design choices, based on either loose or tight coupling between the
spatial and textual part, as well as a hybrid index that combines a traditional
and a learned component. We also perform an experimental evaluation using
several real-world datasets to assess the potential benefits of using a learned
index for evaluating spatio-textual queries
Enhancing In-Memory Spatial Indexing with Learned Search
Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23Ă— to 2.47Ă—), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44Ă— to 53.34Ă— faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia
GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries
Although spatial index structures shorten the query response time, they rely
on complex tree structures to narrow down the search space. Such structures in
turn yield additional storage overhead and take a toll on index maintenance.
Recently, there has been a flurry on works attempting to leverage
machine-Learning(ML) models to simplify the index structures. Some follow-up
works extend the idea to support geospatial point data. These approaches
partition the multidimensional space to cells and assign IDs to these cells
using space-filling curve(e.g., Z-order curve) or mathematical equations. These
approaches work well for geospatial points but are not able to handle complex
geometries such as polygons and trajectories which are widely available in
geospatial data.
This paper introduces GLIN, a lightweight learned index for spatial range
queries on complex geometries. To achieve that, GLIN transforms geometries to
Z-address intervals, and builds a hierarchical model to learn the cumulative
distribution function between these intervals and the record positions. The
lightweight hierarchical model greatly shortens the index probing time.
Furthermore, GLIN augments spatial query windows using an add-on function to
guarantee the query accuracy for both Contains and Intersects spatial
relationships. Our experiments on real-world and synthetic datasets show that
GLIN occupies 40-70 times less storage overhead than popular spatial indexes
such as Quad-Tree while still showing similar query response time in medium
selectivity queries. Moreover, GLIN's maintenance speed is around 1.5 times
higher on insertion and 3-5 times higher on deletion
Region-Based Image Retrieval Revisited
Region-based image retrieval (RBIR) technique is revisited. In early attempts
at RBIR in the late 90s, researchers found many ways to specify region-based
queries and spatial relationships; however, the way to characterize the
regions, such as by using color histograms, were very poor at that time. Here,
we revisit RBIR by incorporating semantic specification of objects and
intuitive specification of spatial relationships. Our contributions are the
following. First, to support multiple aspects of semantic object specification
(category, instance, and attribute), we propose a multitask CNN feature that
allows us to use deep learning technique and to jointly handle multi-aspect
object specification. Second, to help users specify spatial relationships among
objects in an intuitive way, we propose recommendation techniques of spatial
relationships. In particular, by mining the search results, a system can
recommend feasible spatial relationships among the objects. The system also can
recommend likely spatial relationships by assigned object category names based
on language prior. Moreover, object-level inverted indexing supports very fast
shortlist generation, and re-ranking based on spatial constraints provides
users with instant RBIR experiences.Comment: To appear in ACM Multimedia 2017 (Oral
EAGLE—A Scalable Query Processing Engine for Linked Sensor Data
Recently, many approaches have been proposed to manage sensor data using semantic web technologies for effective heterogeneous data integration. However, our empirical observations revealed that these solutions primarily focused on semantic relationships and unfortunately paid less attention to spatio–temporal correlations. Most semantic approaches do not have spatio–temporal support. Some of them have attempted to provide full spatio–temporal support, but have poor performance for complex spatio–temporal aggregate queries. In addition, while the volume of sensor data is rapidly growing, the challenge of querying and managing the massive volumes of data generated by sensing devices still remains unsolved. In this article, we introduce EAGLE, a spatio–temporal query engine for querying sensor data based on the linked data model. The ultimate goal of EAGLE is to provide an elastic and scalable system which allows fast searching and analysis with respect to the relationships of space, time and semantics in sensor data. We also extend SPARQL with a set of new query operators in order to support spatio–temporal computing in the linked sensor data context.EC/H2020/732679/EU/ACTivating InnoVative IoT smart living environments for AGEing well/ACTIVAGEEC/H2020/661180/EU/A Scalable and Elastic Platform for Near-Realtime Analytics for The Graph of Everything/SMARTE
Query by String word spotting based on character bi-gram indexing
In this paper we propose a segmentation-free query by string word spotting
method. Both the documents and query strings are encoded using a recently
proposed word representa- tion that projects images and strings into a common
atribute space based on a pyramidal histogram of characters(PHOC). These
attribute models are learned using linear SVMs over the Fisher Vector
representation of the images along with the PHOC labels of the corresponding
strings. In order to search through the whole page, document regions are
indexed per character bi- gram using a similar attribute representation. On top
of that, we propose an integral image representation of the document using a
simplified version of the attribute model for efficient computation. Finally we
introduce a re-ranking step in order to boost retrieval performance. We show
state-of-the-art results for segmentation-free query by string word spotting in
single-writer and multi-writer standard datasetsComment: To be published in ICDAR201
- …