19 research outputs found

    Enhancing In-Memory Spatial Indexing with Learned Search

    Get PDF
    Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23Ă— to 2.47Ă—), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44Ă— to 53.34Ă— faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia

    An experimental study of learned cardinality estimation

    Get PDF
    Cardinality estimation is a fundamental but long unresolved problem in query optimization. Recently, multiple papers from different research groups consistently report that learned models have the potential to replace existing cardinality estimators. In this thesis, we ask a forward-thinking question: Are we ready to deploy these learned cardinality models in production? Our study consists of three main parts. Firstly, we focus on the static environment (i.e., no data updates) and compare five new learned methods with eight traditional methods on four real-world datasets under a unified workload setting. The results show that learned models are indeed more accurate than traditional methods, but they often suffer from high training and inference costs. Secondly, we explore whether these learned models are ready for dynamic environments (i.e., frequent data updates). We find that they can- not catch up with fast data updates and return large errors for different reasons. For less frequent updates, they can perform better but there is no clear winner among themselves. Thirdly, we take a deeper look into learned models and explore when they may go wrong. Our results show that the performance of learned methods can be greatly affected by the changes in correlation, skewness, or domain size. More importantly, their behaviors are much harder to interpret and often unpredictable. Based on these findings, we identify two promising research directions (control the cost of learned models and make learned models trustworthy) and suggest a number of research opportunities. We hope that our study can guide researchers and practitioners to work together to eventually push learned cardinality estimators into real database systems

    Learned Query Superoptimization

    Full text link
    Traditional query optimizers are designed to be fast and stateless: each query is quickly optimized using approximate statistics, sent off to the execution engine, and promptly forgotten. Recent work on learned query optimization have shown that it is possible for a query optimizer to "learn from its mistakes," correcting erroneous query plans the next time a plan is produced. But what if query optimizers could avoid mistakes entirely? This paper presents the idea of learned query superoptimization. A new generation of query superoptimizers could autonomously experiment to discover optimal plans using exploration-driven algorithms, iterative Bayesian optimization, and program synthesis. While such superoptimizers will take significantly longer to optimize a given query, superoptimizers have the potential to massively accelerate a large number of important repetitive queries being executed on data systems today

    Bao: Learning to Steer Query Optimizers

    Full text link
    Query optimization remains one of the most challenging problems in data management systems. Recent efforts to apply machine learning techniques to query optimization challenges have been promising, but have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties and drawing upon a long history of research in multi-armed bandits, we introduce Bao (the BAndit Optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a decades-old and well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly (an order of magnitude faster than previous approaches) learn strategies that improve end-to-end query execution performance, including tail latency. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a sophisticated commercial system

    Weiterentwicklung analytischer Datenbanksysteme

    Get PDF
    This thesis contributes to the state of the art in analytical database systems. First, we identify and explore extensions to better support analytics on event streams. Second, we propose a novel polygon index to enable efficient geospatial data processing in main memory. Third, we contribute a new deep learning approach to cardinality estimation, which is the core problem in cost-based query optimization.Diese Arbeit trägt zum aktuellen Forschungsstand von analytischen Datenbanksystemen bei. Wir identifizieren und explorieren Erweiterungen um Analysen auf Eventströmen besser zu unterstützen. Wir stellen eine neue Indexstruktur für Polygone vor, die eine effiziente Verarbeitung von Geodaten im Hauptspeicher ermöglicht. Zudem präsentieren wir einen neuen Ansatz für Kardinalitätsschätzungen mittels maschinellen Lernens

    Neo: A Learned Query Optimizer

    Full text link
    Query optimization is one of the most challenging problems in database systems. Despite the progress made over the past decades, query optimizers remain extremely complex components that require a great deal of hand-tuning for specific workloads and datasets. Motivated by this shortcoming and inspired by recent advances in applying machine learning to data management challenges, we introduce Neo (Neural Optimizer), a novel learning-based query optimizer that relies on deep neural networks to generate query executions plans. Neo bootstraps its query optimization model from existing optimizers and continues to learn from incoming queries, building upon its successes and learning from its failures. Furthermore, Neo naturally adapts to underlying data patterns and is robust to estimation errors. Experimental results demonstrate that Neo, even when bootstrapped from a simple optimizer like PostgreSQL, can learn a model that offers similar performance to state-of-the-art commercial optimizers, and in some cases even surpass them

    Local Learning Strategies for Data Management Components

    Get PDF
    In a world with an ever-increasing amount of data processed, providing tools for highquality and fast data processing is imperative. Database Management Systems (DBMSs) are complex adaptive systems supplying reliable and fast data analysis and storage capabilities. To boost the usability of DBMSs even further, a core research area of databases is performance optimization, especially for query processing. With the successful application of Artificial Intelligence (AI) and Machine Learning (ML) in other research areas, the question arises in the database community if ML can also be beneficial for better data processing in DBMSs. This question has spawned various works successfully replacing DBMS components with ML models. However, these global models have four common drawbacks due to their large, complex, and inflexible one-size-fits-all structures. These drawbacks are the high complexity of model architectures, the lower prediction quality, the slow training, and the slow forward passes. All these drawbacks stem from the core expectation to solve a certain problem with one large model at once. The full potential of ML models as DBMS components cannot be reached with a global model because the model’s complexity is outmatched by the problem’s complexity. Therefore, we present a novel general strategy for using ML models to solve data management problems and to replace DBMS components. The novel strategy is based on four advantages derived from the four disadvantages of global learning strategies. In essence, our local learning strategy utilizes divide-and-conquer to place less complex but more expressive models specializing in sub-problems of a data management problem. It splits the problem space into less complex parts that can be solved with lightweight models. This circumvents the one-size-fits-all characteristics and drawbacks of global models. We will show that this approach and the lesser complexity of the specialized local models lead to better problem-solving qualities and DBMS performance. The local learning strategy is applied and evaluated in three crucial use cases to replace DBMS components with ML models. These are cardinality estimation, query optimizer hinting, and integer algorithm selection. In all three applications, the benefits of the local learning strategy are demonstrated and compared to related work. We also generalize the strategy’s usability for a broader application and formulate best practices with instructions for others
    corecore