184 research outputs found
Neo: A Learned Query Optimizer
Query optimization is one of the most challenging problems in database
systems. Despite the progress made over the past decades, query optimizers
remain extremely complex components that require a great deal of hand-tuning
for specific workloads and datasets. Motivated by this shortcoming and inspired
by recent advances in applying machine learning to data management challenges,
we introduce Neo (Neural Optimizer), a novel learning-based query optimizer
that relies on deep neural networks to generate query executions plans. Neo
bootstraps its query optimization model from existing optimizers and continues
to learn from incoming queries, building upon its successes and learning from
its failures. Furthermore, Neo naturally adapts to underlying data patterns and
is robust to estimation errors. Experimental results demonstrate that Neo, even
when bootstrapped from a simple optimizer like PostgreSQL, can learn a model
that offers similar performance to state-of-the-art commercial optimizers, and
in some cases even surpass them
Query optimizers based on machine learning techniques
Dissertação de mestrado integrado em Engenharia InformáticaQuery optimizers are considered one of the most relevant and sophisticated components
in a database management system. However, despite currently producing nearly optimal
results, optimizers rely on statistical estimates and heuristics to reduce the search space
of alternative execution plans for a single query. As a result, for more complex queries,
errors may grow exponentially, often translating into sub-optimal plans resulting in less
than ideal performance. Recent advances in machine learning techniques have opened
new opportunities for many of the existing problems related to system optimization.
This document proposes a solution built on top of PostgreSQL that learns to select
the most efficient set of optimizer strategy settings for a particular query. Instead of
depending entirely on the optimizer’s estimates to compare different plans under different
configurations, it relies on a greedy selection algorithm that supports several types of
predictive modeling techniques, from more traditional modeling techniques to a deep
learning approach.
The system is evaluated experimentally with the standard TPC-H and Join Order ing Benchmark workloads to measure the cost and benefits of adding machine learning
capabilities to traditional query optimizers.Os otimizadores de queries são considerados um dos componentes de maior relevância e
complexidade num sistema de gestão de bases de dados. No entanto, apesar de atualmente
produzirem resultados quase ótimos, os otimizadores dependem do uso de estimativas
estatísticas e de heurísticas para reduzir o espaço de procura de planos de execução alternativos para uma determinada query. Como resultado, para queries mais complexas, os erros podem crescer exponencialmente, o que geralmente se traduz em planos sub-ótimos,
resultando num desempenho inferior ao ideal. Os recentes avanços nas técnicas de aprendizagem automática abriram novas oportunidades para muitos dos problemas existentes relacionados com otimização de sistemas.
Este documento propõe uma solução construída sobre o PostgreSQL que aprende a
selecionar o conjunto mais eficiente de configurações do otimizador para uma determinada
query. Em vez de depender inteiramente de estimativas do otimizador para comparar
planos de configurações diferentes, a solução baseia-se num algoritmo de seleção greedy que
suporta vários tipos de técnicas de modelagem preditiva, desde técnicas mais tradicionais
a uma abordagem de deep learning.
O sistema é avaliado experimentalmente com os workloads TPC-H e Join Ordering
Benchmark para medir o custo e os benefícios de adicionar aprendizagem automática a
otimizadores de queries tradicionais.This work is financed by National Funds through the Portuguese funding agency, FCT
- Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020
Incremental Processing and Optimization of Update Streams
Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications.
Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks
- …