Search CORE

38 research outputs found

Workload-Aware Performance Tuning for Autonomous DBMSs

Author: Chainani Naresh
Lin Chunbin
Lu Jiaheng
Yan Zhengtong
Publication venue: IEEE
Publication date: 19/04/2021
Field of study

Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. There is no one-size-fits-all configuration that works for different workloads since each workload has varying patterns with different resource requirements. There is a relationship between configuration, workload, and system performance. If a configuration cannot adapt to the dynamic changes of a workload, there could be a significant degradation in the overall performance of DBMS unless a sophisticated administrator is continuously re-configuring the DBMS. In this tutorial, we focus on autonomous workload-aware performance tuning, which is expected to automatically and continuously tune the configuration as the workload changes. We survey three research directions, including 1) workload classification, 2) workload forecasting, and 3) workload-based tuning. While the first two topics address the issue of obtaining accurate workload information, the third one tackles the problem of how to properly use the workload information to optimize performance. We also identify research challenges and open problems, and give real-world examples about leveraging workload information for database tuning in commercial products (e.g., Amazon Redshift). We will demonstrate workload-aware performance tuning in Amazon Redshift in the presentation.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

PREDICTION BASED WORKLOAD PERFORMANCE EVALUATION FOR DISASTER MANAGEMENT SPATIAL DATABASE

Author: F. S. Utomo
F. S. Utomo
M. S. Rohman
N. Suryana
Publication venue: 'Copernicus GmbH'
Publication date: 01/09/2018
Field of study

This paper discusses a prediction based workload performance evaluation implementation during Disaster Management, especially at the response phase, to handle large spatial data in the event of an eruption of the Merapi volcano in Indonesia. Complexity associated with a large spatial database are not the same with the conventional database. This implies that in coming complex work loads are difficult to be handled by human from which needs longer processing time and may lead to failure and undernourishment. Based on incoming workload, this study is intended to predict the associated workload into OLTP and DSS workload performance types. From the SQL statements, it is clear that the DBMS can obtain and record the process, measure the analysed performances and the workload classifier in the form of DBMS snapshots. The Case-Based Reasoning (CBR) optimised with Hash Search Technique has been adopted in this study to evaluate and predict the workload performance of PostgreSQL. It has been proven that the proposed CBR using Hash Search technique has resulted in acceptable prediction of the accuracy measurement than other machine learning algorithm like Neural Network and Support Vector Machine. Besides, the results of the evaluation using confusion matrix has resulted in very good accuracy as well as improvement in execution time. Additionally, the results of the study indicated that the prediction model for workload performance evaluation using CBR which is optimised by Hash Search technique for determining workload data on shortest path analysis via the employment of Dijkstra algorithm. It could be useful for the prediction of the incoming workload based on the status of the predetermined DBMS parameters. In this way, information is delivered to DBMS hence ensuring incoming workload information that is very crucial to determine the smooth works of PostgreSQL

Directory of Open Access Journals

10381 Summary and Abstracts Collection -- Robust Query Processing

Author: Kuno Harumi Anne
Markl Volker
Sattler Kai-Uwe
Publication venue: Dagstuhl Seminar Proceedings. 10381 - Robust Query Processing
Publication date: 01/01/2011
Field of study

Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

Dagstuhl Research Online Publication Server

Recommended from our members

Optimizing Data-Intensive Computing with Efficient Configuration Tuning

Author: Fekry Ayat
Publication venue: University of Cambridge
Publication date: 30/07/2021
Field of study

As the complexity of distributed analytics systems evolves over time, more configuration parameters get exposed for tuning. While these numerous parameters allow users more control over how their workloads are executed, this flexibility comes at a cost, since finding the right configurations for such systems in a cost-effective way becomes challenging. In practice, several factors contribute to the complexity of tuning the configuration of those systems: the large configuration space, the diversity of the served workloads (each workload possibly requiring a different resource allocation strategy to run optimally), and the dynamic characteristics of these systems’ environment (e.g., increase in input data size, changes in the allocation of resources). Paradoxically, existing solutions for workload tuning either assume static tuning environment or workloads that are inexpensive to run (i.e. requiring hundreds of execution samples). Recently, Bayesian Optimisation (BO) strategies have been applied as a solution to enable efficient autotuning. They build a probabilistic model incrementally to predict the impact of the parameters on performance using a small number of execution samples. The incrementally constructed BO model is used to guide the tuning process and accelerate convergence to a near-optimal configuration. Unfortunately, for distributed analytics systems, the configuration space is too large to construct a good model using traditional BO, which fails to provide quick convergence in high dimensional configuration space. I argue that cost-effective tuning strategies can only be developed when taking into account: the frequent changes that can happen in the analytics workload/environment, the amortization of tuning costs and how this influences tuning profitability, the high dimensionality of configuration space and the need to cater for diverse workloads. To tackle these challenges, I propose Tuneful, an efficient configuration tuning framework for such expensive to tune systems. It works efficiently both initially (when little data is available) as well as later (as more tuning knowledge is acquired). It starts with learning workload-specific influential parameters incrementally and tunes those only, then when more tuning knowledge becomes available, it detects similarity across workloads and utilizes multitask BO to share the tuning knowledge across similar workloads. I show how augmenting the BO approach with parameters’ significance and workload similarity characteristics enables an efficient configuration tuning in high dimensional configuration space. Over diverse analytics workloads, this significantly accelerates both configuration tuning and cost amortization, saving search time by 2.7-3.7X at median compared to the-state-of-the-art approaches

Apollo (Cambridge)

Query Interactions in Database Systems

Author: Ahmad Mumtaz
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

The typical workload in a database system consists of a mix of multiple queries of different types, running concurrently and interacting with each other. The same query may have different performance in different mixes. Hence, optimizing performance requires reasoning about query mixes and their interactions, rather than considering individual queries or query types. In this dissertation, we demonstrate how queries affect each other when they are executing concurrently in different mixes. We show the significant impact that query interactions can have on the end-to-end workload performance. A major hurdle in the understanding of query interactions in database systems is that there is a large spectrum of possible causes of interactions. For example, query interactions can happen because of any of the resource-related, data-related or configuration-related dependencies that exist in the system. This variation in underlying causes makes it very difficult to come up with robust analytical performance models to capture and model query interactions. We present a new approach for modeling performance in the presence of interactions, based on conducting experiments to measure the effect of query interactions and fitting statistical models to the data collected in these experiments to capture the impact of query interactions. The experiments collect samples of the different possible query mixes, and measure the performance metrics of interest for the different queries in these sample mixes. Statistical models such as simple regression and instance-based learning techniques are used to train models from these sample mixes. This approach requires no prior assumptions about the internal workings of the database system or the nature or cause of the interactions, making it portable across systems. We demonstrate the potential of capturing, modeling, and exploiting query interactions by developing techniques to help in two database performance related tasks: workload scheduling and estimating the completion time of a workload. These are important workload management problems that database administrators have to deal with routinely. We consider the problem of scheduling a workload of report-generation queries. Our scheduling algorithms employ statistical performance models to schedule appropriate query mixes for the given workload. Our experimental evaluation demonstrates that our interaction-aware scheduling algorithms outperform scheduling policies that are typically used in database systems. The problem of estimating the completion time of a workload is an important problem, and the state of the art does not offer any systematic solution. Typically database administrators rely on heuristics or observations of past behavior to solve this problem. We propose a more rigorous solution to this problem, based on a workload simulator that employs performance models to simulate the execution of the different mixes that make up a workload. This mix-based simulator provides a systematic tool that can help database administrators in estimating workload completion time. Our experimental evaluation shows that our approach can estimate the workload completion times with a high degree of accuracy. Overall, this dissertation demonstrates that reasoning about query interactions holds significant potential for realizing performance improvements in database systems. The techniques developed in this work can be viewed as initial steps in this interesting area of research, with lots of potential for future work

CiteSeerX

University of Waterloo's Institutional Repository

Evolving a secure grid-enabled, distributed data warehouse : a standards-based perspective

Author: Li Xiao-Yu
Publication venue: Faculty of Engineering, the Built Environment and Information Technology
Publication date: 01/01/2007
Field of study

As digital data-collection has increased in scale and number, it becomes an important type of resource serving a wide community of researchers. Cross-institutional data-sharing and collaboration introduce a suitable approach to facilitate those research institutions that are suffering the lack of data and related IT infrastructures. Grid computing has become a widely adopted approach to enable cross-institutional resource-sharing and collaboration. It integrates a distributed and heterogeneous collection of locally managed users and resources. This project proposes a distributed data warehouse system, which uses Grid technology to enable data-access and integration, and collaborative operations across multi-distributed institutions in the context of HV/AIDS research. This study is based on wider research into OGSA-based Grid services architecture, comprising a data-analysis system which utilizes a data warehouse, data marts, and near-line operational database that are hosted by distributed institutions. Within this framework, specific patterns for collaboration, interoperability, resource virtualization and security are included. The heterogeneous and dynamic nature of the Grid environment introduces a number of security challenges. This study also concerns a set of particular security aspects, including PKI-based authentication, single sign-on, dynamic delegation, and attribute-based authorization. These mechanisms, as supported by the Globus Toolkit’s Grid Security Infrastructure, are used to enable interoperability and establish trust relationship between various security mechanisms and policies within different institutions; manage credentials; and ensure secure interactions

SEALS Digital commons

Nelson Mandela University

South East Academic Libraries System (SEALS)

The 11th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2018
Field of study

University of Szeged

Web Services and Model-Driven Enterprise Information Systems:Proceedings of the Joint Workshop on Web Services and Model-Driven Enterprise Information Services, WSMDEIS 2005

Author: Bevinakoppa S.
Hammoudi S.
Publication venue: INSTICC PRESS
Publication date: 01/01/2005
Field of study

University of Twente Research Information

Gewinnung, Verwaltung und Anwendung von Performance-Daten zur Unterstützung des autonomen Datenbank-Tuning

Author: Wiese David
Publication venue
Publication date: 01/12/2011
Field of study

In den letzten Jahrzehnten ist die Komplexität und Heterogenität von Informationssystemen rapide gestiegen. Die Folge ist, dass viele moderne IT-Systeme aufgrund ihrer heterogenen Architektur- und Applikationsvielfalt sehr kostenintensiv in der Entwicklung, fehleranfällig in der Nutzung und schwierig durch Administratoren kontrollier- bzw. konfigurierbar sind. Initiativen wie das Autonomic Computing helfen, der steigenden Komplexität Herr zu werden, indem sie den „Problemfaktor Mensch“ entlasten und Technik nutzen, um Technik zu verwalten. Durch die Anpassung bzw. Erweiterung der System-Umgebung versuchen derartige Ansätze neben derzeitiger manueller, reaktiver Performance-Optimierung, eine automatisierte reaktive und proaktive Performance-Kontrolle zu gewährleisten. Zentrale Grundvoraussetzung für eine autonome Infrastruktur ist eine verlässliche, globale Daten- bzw. Wissensbasis. Wir erarbeiten, wie Performance-Daten über das Verhalten und den Zustand des Systems mit aus dem Data-Warehousing bekannten Techniken gesammelt, konsolidiert, verwaltet und zur Laufzeit ausgewertet werden können. Neben der Architektur und den funktionalen Komponenten eines solchen Performance Data Warehouse wird zudem dessen Datenmodell erläutert und die Anbindung an das vorausgehende Monitoring sowie die nachfolgende Analyse spezifiziert. Mit dem Ziel, die menschliche Vorgehensweise „nachzuahmen“ und somit die Administratoren bei ihren Routine-Tätigkeiten zu entlasten, widmen wir uns der Konzipierung und Beschreibung einer möglichen Infrastruktur zur Automatisierung typischer Tuning-Aufgaben. Wir erarbeiten allgemein und anhand von Beispielen, wie Tuning-Wissen und bewährte Praktiken von DBAs abgebildet, in Form von Workflows formalisiert und zur Laufzeit für die Problemlösung angewendet werden können

Digitale Bibliothek Thüringen