17 research outputs found

    Towards application-specific query processing systems

    Get PDF
    Database systems use query processing subsystems for enabling efficient query-based data retrieval. An essential aspect of designing any query-intensive application is tuning the query system to fit the application's requirements and workload characteristics. However, the configuration parameters provided by traditional database systems do not cover the design decisions and trade-offs that arise from the geo-distribution of users and data. In this paper, we present a vision towards a new type of query system architecture that addresses this challenge by enabling query systems to be designed and deployed in a per use case basis. We propose a distributed abstraction called Query Processing Unit that encapsulates primitive query processing tasks, and show how it can be used as a building block for assembling query systems. Using this approach, application architects can construct query systems specialized to their use cases, by controlling the query system's architecture and the placement of its state. We demonstrate the expressiveness of this approach by applying it to the design of a query system that can flexibly place its state in the data center or at the edge, and show that state placement decisions affect the trade-off between query response time and query result freshness

    Efficient top K temporal spatial keyword search

    Get PDF
    Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale in many emerging applications such as location based services and social networks. Due to their importance, a large body of work has focused on efficiently computing various spatial keyword queries. In this paper, we study the top-k temporal spatial keyword query which considers three important constraints during the search including time, spatial proximity and textual relevance. A novel index structure, namely SSG-tree, to efficiently insert/delete spatio-temporal web objects with high rates. Base on SSG-tree an efficient algorithm is developed to support top-k temporal spatial keyword query. We show via extensive experimentation with real spatial databases that our method has increased performance over alternate techniques

    qwLSH: Cache-conscious Indexing for Processing Similarity Search Query Workloads in High-Dimensional Spaces

    Full text link
    Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. Since exact similarity search indexing techniques suffer from the well-known curse of dimensionality in high-dimensional spaces, approximate search techniques are often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be an effective approximate search method for solving similarity search queries in high-dimensional spaces. Often times, queries in real-world settings arrive as part of a query workload. LSH and its variants are particularly designed to solve single queries effectively. They suffer from one major drawback while executing query workloads: they do not take into consideration important data characteristics for effective cache utilization while designing the index structures. In this paper, we present qwLSH, an index structure for efficiently processing similarity search query workloads in high-dimensional spaces. We intelligently divide a given cache during processing of a query workload by using novel cost models. Experimental results show that, given a query workload, qwLSH is able to perform faster than existing techniques due to its unique cost models and strategies.Comment: Extended version of the published wor

    Extended high dimensional indexing approach for reachability queries on very large graphs

    Get PDF
    Given a directed acyclic graph G = (V,A) and two vertices u, v ∈ V , the reachability problem is to answer if there is a path from u to v in the graph. In the context of very large graphs, with millions of vertices and a series of queries to be answered, it is not practical to search the graph for each query. On the other hand, the storage of the full transitive closure of the graph is also impractical due to its O(|V |2) size. Scalable approaches aim to create indices used to prune the search during its execution. Negative indices may be able to determine (in constant time) that a query has a negative answer while positive indices may determine (again in constant time) that a query has a positive answer. In this paper we propose novel scalable approach called LYNX that uses a large number of topological sorts of G as a negative cut index without degrading the query time. A similar strategy is applied regarding a positive cut index. In addition, LYNX proposes a user-defined index size that enables the user to control the ratio between negative and positive cuts depending on the expected query pattern. We show by computational experiments that LYNX consistently outperforms the state-of-the-art approach in terms of query-time using the same index-size for graphs with high reachability ratio. In intelligent computer systems that rely on frequent tests of connectivity in graphs, LYNX can reduce the time delay experience by end users through a reduced query time. This comes at the expense of an increased setup time whenever the underlying graph is updated. Keywords: directed acyclic graphs, topological sorts, reachability queries, graph indexingpublishedVersio
    corecore