Query Mesh: An Efficient Multi-Route Approach to Query Optimization

Abstract

In most database systems, traditional and stream systems alike, the optimizer picks a single query plan for all data based on the overall statistics of the data. It has however been repeatedly observed that real-life datasets are non-uniform. Selecting a single execution plan may result in a query execution that is ineffective for possibly large portions of the actual data. In this paper, we present a practical alternative to the current state-of-the-art query optimization techniques, termed a multiroute query mesh model (or short QM). The main idea of QM is to compute multiple routes (query plans), each designed for a particular subset of data with distinct statistical properties. Based on the execution routes and the data characteristics, a classifier model is induced. The classifier is used for efficient partitioning of the new data to assign the best route for query processing. We formulate the QM search space and analyze its complexity. To find optimal query meshes, we design the Opt-QM algorithm. Faced with a dilemma – whether to determine distinct data subsets or to compute a set of execution routes first, we design several heuristics that can effectively find good quality query meshes very efficiently. For runtime query processing, we employ a Self-Routing Fabric (SRF) infrastructure which supports shared operator processing and has near-zero routing overhead. Results of our experimental study with real-life and synthetic data indicate that QM-based approach consistently provides better query execution performance for skewed datasets compared to the state-of-the-art alternatives, namely both the traditional systems that employ a single pre-computed plan execution and also the systems that determine different routes on-the-fly

    Similar works