2,640 research outputs found
Learning a Partitioning Advisor with Deep Reinforcement Learning
Commercial data analytics products such as Microsoft Azure SQL Data Warehouse
or Amazon Redshift provide ready-to-use scale-out database solutions for
OLAP-style workloads in the cloud. While the provisioning of a database cluster
is usually fully automated by cloud providers, customers typically still have
to make important design decisions which were traditionally made by the
database administrator such as selecting the partitioning schemes.
In this paper we introduce a learned partitioning advisor for analytical
OLAP-style workloads based on Deep Reinforcement Learning (DRL). The main idea
is that a DRL agent learns its decisions based on experience by monitoring the
rewards for different workloads and partitioning schemes. We evaluate our
learned partitioning advisor in an experimental evaluation with different
databases schemata and workloads of varying complexity. In the evaluation, we
show that our advisor is not only able to find partitionings that outperform
existing approaches for automated partitioning design but that it also can
easily adjust to different deployments. This is especially important in cloud
setups where customers can easily migrate their cluster to a new set of
(virtual) machines
Deep Learning Data and Indexes in a Database
A database is used to store and retrieve data, which is a critical component for any software application. Databases requires configuration for efficiency, however, there are tens of configuration parameters. It is a challenging task to manually configure a database. Furthermore, a database must be reconfigured on a regular basis to keep up with newer data and workload. The goal of this thesis is to use the query workload history to autonomously configure the database and improve its performance. We achieve proposed work in four stages: (i) we develop an index recommender using deep reinforcement learning for a standalone database. We evaluated the effectiveness of our algorithm by comparing with several state-of-the-art approaches, (ii) we build a real-time index recommender that can, in real-time, dynamically create and remove indexes for better performance in response to sudden changes in the query workload, (iii) we develop a database advisor. Our advisor framework will be able to learn latent patterns from a workload. It is able to enhance a query, recommend interesting queries, and summarize a workload, (iv) we developed LinkSocial, a fast, scalable, and accurate framework to gain deeper insights from heterogeneous data
Workload-Aware Performance Tuning for Autonomous DBMSs
Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. There is no one-size-fits-all configuration that works for different workloads since each workload has varying patterns with different resource requirements. There is a relationship between configuration, workload, and system performance. If a configuration cannot adapt to the dynamic changes of a workload, there could be a significant degradation in the overall performance of DBMS unless a sophisticated administrator is continuously re-configuring the DBMS. In this tutorial, we focus on autonomous workload-aware performance tuning, which is expected to automatically and continuously tune the configuration as the workload changes. We survey three research directions, including 1) workload classification, 2) workload forecasting, and 3) workload-based tuning. While the first two topics address the issue of obtaining accurate workload information, the third one tackles the problem of how to properly use the workload information to optimize performance. We also identify research challenges and open problems, and give real-world examples about leveraging workload information for database tuning in commercial products (e.g., Amazon Redshift). We will demonstrate workload-aware performance tuning in Amazon Redshift in the presentation.Peer reviewe
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
Learning-Based Data Storage [Vision] (Technical Report)
Deep neural network (DNN) and its variants have been extensively used for a
wide spectrum of real applications such as image classification, face/speech
recognition, fraud detection, and so on. In addition to many important machine
learning tasks, as artificial networks emulating the way brain cells function,
DNNs also show the capability of storing non-linear relationships between input
and output data, which exhibits the potential of storing data via DNNs. We
envision a new paradigm of data storage, "DNN-as-a-Database", where data are
encoded in well-trained machine learning models. Compared with conventional
data storage that directly records data in raw formats, learning-based
structures (e.g., DNN) can implicitly encode data pairs of inputs and outputs
and compute/materialize actual output data of different resolutions only if
input data are provided. This new paradigm can greatly enhance the data
security by allowing flexible data privacy settings on different levels,
achieve low space consumption and fast computation with the acceleration of new
hardware (e.g., Diffractive Neural Network and AI chips), and can be
generalized to distributed DNN-based storage/computing. In this paper, we
propose this novel concept of learning-based data storage, which utilizes a
learning structure called learning-based memory unit (LMU), to store, organize,
and retrieve data. As a case study, we use DNNs as the engine in the LMU, and
study the data capacity and accuracy of the DNN-based data storage. Our
preliminary experimental results show the feasibility of the learning-based
data storage by achieving high (100%) accuracy of the DNN storage. We explore
and design effective solutions to utilize the DNN-based data storage to manage
and query relational tables. We discuss how to generalize our solutions to
other data types (e.g., graphs) and environments such as distributed DNN
storage/computing.Comment: 14 pages, 16 figure
Improvement of Decision on Coding Unit Split Mode and Intra-Picture Prediction by Machine Learning
High efficiency Video Coding (HEVC) has been deemed as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The reference software (i.e., HM) have included the implementations of the guidelines in appliance with the new standard. The software includes both encoder and decoder functionality.
Machine learning (ML) works with data and processes it to discover patterns that can be later used to analyze new trends. ML can play a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems.
In this research project, in compliance with H.265 standard, we are focused on improvement of the performance of encode/decode by optimizing the partition of prediction block in coding unit with the help of supervised machine learning. We used Keras library as the main tool to implement the experiments. Key parameters were tuned for the model in our convolution neuron network. The coding tree unit mode decision time produced in the model was compared with that produced in HM software, and it was proved to have improved significantly. The intra-picture prediction mode decision was also investigated with modified model and yielded satisfactory results
Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation
One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model
Towards Scalable Circuit Partitioning for Multi-Core Quantum Architectures with Deep Reinforcement Learning
La computación cuántica tiene un inmenso potencial para resolver problemas clásicamente intratables aprovechando las propiedades únicas de los cúbits. Sin embargo, la escalabilidad de las arquitecturas cuánticas sigue siendo un desafÃo significativo. Para abordar este problema, se proponen arquitecturas cuánticas de múltiples núcleos. No obstante, la realización de dichas arquitecturas plantea múltiples desafÃos en hardware, algoritmos y la interfaz entre ellos. En particular, uno de estos desafÃos es cómo particionar de manera óptima los algoritmos para que se ajusten dentro de los múltiples núcleos. Esta tesis presenta un enfoque novedoso para la partición escalable de circuitos en arquitecturas cuánticas de múltiples núcleos utilizando Aprendizaje Profundo Reforzado. El objetivo es superar a los algoritmos metaheurÃsticos existentes, como el algoritmo de particionamiento de FGP-rOEE, en términos de precisión y escalabilidad. Esta investigación contribuye al avance tanto de la computación cuántica como de las técnicas de particionamiento de gráficos, ofreciendo nuevos conocimientos sobre la optimización de los sistemas cuánticos. Al abordar los desafÃos asociados con la escalabilidad de las computadoras cuánticas, abrimos el camino para su implementación práctica en la resolución de problemas computacionalmente desafiantes.Quantum computing holds immense potential for solving classically intractable problems by leveraging the unique properties of qubits. However, the scalability of quantum architectures remains a significant challenge. To address this issue, multi-core quantum architectures are proposed. Yet, the realization of such multi-core architectures poses multiple challenges in hardware, algorithms, and the interface between them. In particular, one of these challenges is how to optimally partition the algorithms to fit within the cores of a multi-core quantum computer. This thesis presents a novel approach for scalable circuit partitioning on multi-core quantum architectures using Deep Reinforcement Learning. The objective is to surpass existing meta-heuristic algorithms, such as FGP-rOEE's partitioning algorithm, in terms of accuracy and scalability. This research contributes to the advancement of both quantum computing and graph partitioning techniques, offering new insights into the optimization of quantum systems. By addressing the challenges associated with scaling quantum computers, we pave the way for their practical implementation in solving computationally challenging problems
- …