17 research outputs found
Towards application-specific query processing systems
Database systems use query processing subsystems for enabling efficient
query-based data retrieval. An essential aspect of designing any
query-intensive application is tuning the query system to fit the application's
requirements and workload characteristics. However, the configuration
parameters provided by traditional database systems do not cover the design
decisions and trade-offs that arise from the geo-distribution of users and
data. In this paper, we present a vision towards a new type of query system
architecture that addresses this challenge by enabling query systems to be
designed and deployed in a per use case basis. We propose a distributed
abstraction called Query Processing Unit that encapsulates primitive query
processing tasks, and show how it can be used as a building block for
assembling query systems. Using this approach, application architects can
construct query systems specialized to their use cases, by controlling the
query system's architecture and the placement of its state. We demonstrate the
expressiveness of this approach by applying it to the design of a query system
that can flexibly place its state in the data center or at the edge, and show
that state placement decisions affect the trade-off between query response time
and query result freshness
Efficient top K temporal spatial keyword search
Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale in many emerging applications such as location based services and social networks. Due to their importance, a large body of work has focused on efficiently computing various spatial keyword queries. In this paper, we study the top-k temporal spatial keyword query which considers three important constraints during the search including time, spatial proximity and textual relevance. A novel index structure, namely SSG-tree, to efficiently insert/delete spatio-temporal web objects with high rates. Base on SSG-tree an efficient algorithm is developed to support top-k temporal spatial keyword query. We show via extensive experimentation with real spatial databases that our method has increased performance over alternate techniques
qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces
Similarity search queries in high-dimensional spaces are an important type of
queries in many domains such as image processing, machine learning, etc. Since
exact similarity search indexing techniques suffer from the well-known curse of
dimensionality in high-dimensional spaces, approximate search techniques are
often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be
an effective approximate search method for solving similarity search queries in
high-dimensional spaces. Often times, queries in real-world settings arrive as
part of a query workload. LSH and its variants are particularly designed to
solve single queries effectively. They suffer from one major drawback while
executing query workloads: they do not take into consideration important data
characteristics for effective cache utilization while designing the index
structures. In this paper, we present qwLSH, an index structure for efficiently
processing similarity search query workloads in high-dimensional spaces. We
intelligently divide a given cache during processing of a query workload by
using novel cost models. Experimental results show that, given a query
workload, qwLSH is able to perform faster than existing techniques due to its
unique cost models and strategies.Comment: Extended version of the published wor
Extended high dimensional indexing approach for reachability queries on very large graphs
Given a directed acyclic graph G = (V,A) and two vertices u, v ∈ V , the reachability problem is to answer if there is a path from u to v in the graph. In the context of very large graphs, with millions of vertices and a series of queries to be answered, it is not practical to search the graph for each query. On the other hand, the storage of the full transitive closure of the graph is also impractical due to its O(|V |2) size. Scalable approaches aim to create indices used to prune the search during its execution. Negative indices may be able to determine (in constant time) that a query has a negative answer while positive indices may determine (again in constant time) that a query has a positive answer. In this paper we propose novel scalable approach called LYNX that uses a large number of topological sorts of G as a negative cut index without degrading the query time. A similar strategy is applied regarding a positive cut index. In addition, LYNX proposes a user-defined index size that enables the user to control the ratio between negative and positive cuts depending on the expected query pattern. We show by computational experiments that LYNX consistently outperforms the state-of-the-art approach in terms of query-time using the same index-size for graphs with high reachability ratio. In intelligent computer systems that rely on frequent tests of connectivity in graphs, LYNX can reduce the time delay experience by end users through a reduced query time. This comes at the expense of an increased setup time whenever the underlying graph is updated. Keywords: directed acyclic graphs, topological sorts, reachability queries, graph indexingpublishedVersio