38 research outputs found
Workload-Aware Performance Tuning for Autonomous DBMSs
Optimal configuration is vital for a DataBase Management System (DBMS) to achieve high performance. There is no one-size-fits-all configuration that works for different workloads since each workload has varying patterns with different resource requirements. There is a relationship between configuration, workload, and system performance. If a configuration cannot adapt to the dynamic changes of a workload, there could be a significant degradation in the overall performance of DBMS unless a sophisticated administrator is continuously re-configuring the DBMS. In this tutorial, we focus on autonomous workload-aware performance tuning, which is expected to automatically and continuously tune the configuration as the workload changes. We survey three research directions, including 1) workload classification, 2) workload forecasting, and 3) workload-based tuning. While the first two topics address the issue of obtaining accurate workload information, the third one tackles the problem of how to properly use the workload information to optimize performance. We also identify research challenges and open problems, and give real-world examples about leveraging workload information for database tuning in commercial products (e.g., Amazon Redshift). We will demonstrate workload-aware performance tuning in Amazon Redshift in the presentation.Peer reviewe
PREDICTION BASED WORKLOAD PERFORMANCE EVALUATION FOR DISASTER MANAGEMENT SPATIAL DATABASE
This paper discusses a prediction based workload performance evaluation implementation during Disaster Management, especially at the response phase, to handle large spatial data in the event of an eruption of the Merapi volcano in Indonesia. Complexity associated with a large spatial database are not the same with the conventional database. This implies that in coming complex work loads are difficult to be handled by human from which needs longer processing time and may lead to failure and undernourishment. Based on incoming workload, this study is intended to predict the associated workload into OLTP and DSS workload performance types. From the SQL statements, it is clear that the DBMS can obtain and record the process, measure the analysed performances and the workload classifier in the form of DBMS snapshots. The Case-Based Reasoning (CBR) optimised with Hash Search Technique has been adopted in this study to evaluate and predict the workload performance of PostgreSQL. It has been proven that the proposed CBR using Hash Search technique has resulted in acceptable prediction of the accuracy measurement than other machine learning algorithm like Neural Network and Support Vector Machine. Besides, the results of the evaluation using confusion matrix has resulted in very good accuracy as well as improvement in execution time. Additionally, the results of the study indicated that the prediction model for workload performance evaluation using CBR which is optimised by Hash Search technique for determining workload data on shortest path analysis via the employment of Dijkstra algorithm. It could be useful for the prediction of the incoming workload based on the status of the predetermined DBMS parameters. In this way, information is delivered to DBMS hence ensuring incoming workload information that is very crucial to determine the smooth works of PostgreSQL
10381 Summary and Abstracts Collection -- Robust Query Processing
Dagstuhl seminar 10381 on robust query processing (held 19.09.10 -
24.09.10) brought together a diverse set of researchers and practitioners
with a broad range of expertise for the purpose of fostering discussion
and collaboration regarding causes, opportunities, and solutions for
achieving robust query processing.
The seminar strove to build a unified view across
the loosely-coupled system components responsible for
the various stages of database query processing.
Participants were chosen for their experience with database
query processing and, where possible, their prior work in academic
research or in product development towards robustness in database query
processing.
In order to pave the way to motivate, measure, and protect future advances
in robust query processing, seminar 10381 focused on developing tests
for measuring the robustness of query processing.
In these proceedings, we first review the seminar topics, goals,
and results, then present abstracts or notes of some of the seminar break-out
sessions.
We also include, as an appendix,
the robust query processing reading list that
was collected and distributed to participants before the seminar began,
as well as summaries of a few of those papers that were
contributed by some participants
Recommended from our members
Optimizing Data-Intensive Computing with Efficient Configuration Tuning
As the complexity of distributed analytics systems evolves over time, more configuration parameters get exposed for tuning. While these numerous parameters allow users more control over how their workloads are executed, this flexibility comes at a cost, since finding the right configurations for such systems in a cost-effective way becomes challenging. In practice, several factors contribute to the complexity of tuning the configuration of those systems: the large configuration space, the diversity of the served workloads (each workload possibly requiring a different resource allocation strategy to run optimally), and the dynamic
characteristics of these systems’ environment (e.g., increase in input data size, changes in the allocation of resources). Paradoxically, existing solutions for workload tuning either assume static tuning environment or workloads that are inexpensive to run (i.e. requiring hundreds of execution samples). Recently, Bayesian Optimisation (BO) strategies have been applied as a solution to enable efficient autotuning. They build a probabilistic model incrementally to predict the impact of the parameters on performance using a small number of execution samples. The incrementally constructed BO model is used to guide the tuning process and accelerate convergence to a near-optimal configuration. Unfortunately, for distributed analytics systems, the configuration space is too large to construct a good model using traditional BO, which fails to provide quick convergence in high dimensional configuration space.
I argue that cost-effective tuning strategies can only be developed when taking into account: the frequent changes that can happen in the analytics workload/environment, the amortization of tuning costs and how this influences tuning profitability, the high dimensionality of configuration
space and the need to cater for diverse workloads. To tackle these challenges, I propose Tuneful, an efficient configuration tuning framework
for such expensive to tune systems. It works efficiently both initially (when little data is available) as well as later (as more tuning knowledge is acquired). It starts with learning workload-specific influential parameters incrementally and tunes those only, then when more tuning knowledge becomes available, it detects similarity across workloads and utilizes multitask BO to share the tuning knowledge across similar workloads. I show how augmenting the BO approach with parameters’ significance and workload similarity characteristics enables an
efficient configuration tuning in high dimensional configuration space. Over diverse analytics workloads, this significantly accelerates both configuration tuning and cost amortization, saving search time by 2.7-3.7X at median compared to the-state-of-the-art approaches
Query Interactions in Database Systems
The typical workload in a database system consists of a mix of multiple queries of different types, running concurrently and
interacting with each other. The same query may have different performance in different mixes. Hence, optimizing performance
requires reasoning about query mixes and their interactions, rather than considering individual queries or query types. In this
dissertation, we demonstrate how queries affect each other when they are executing concurrently in different mixes. We show the
significant impact that query interactions can have on the end-to-end workload performance. A major hurdle in the understanding of query interactions in
database systems is that there is a large spectrum of possible causes of interactions. For example, query interactions can happen
because of any of the resource-related, data-related or configuration-related dependencies that exist in the system. This
variation in underlying causes makes it very difficult to come up with robust analytical performance models to capture and model query
interactions. We present a new approach for modeling performance in the presence of interactions, based on conducting experiments to measure the effect of query interactions and fitting statistical models to the data collected in these experiments to capture the impact of query interactions. The experiments collect samples of the different possible query mixes, and measure the performance metrics of interest for the different queries in these sample mixes.
Statistical models such as simple regression and instance-based learning techniques are used to train models from these sample mixes. This approach requires no prior assumptions about the internal workings of the database system or the nature or cause of
the interactions, making it portable across systems. We demonstrate the potential of capturing, modeling, and exploiting query interactions by developing techniques to help in two database performance related tasks: workload scheduling and estimating the
completion time of a workload. These are important workload management problems that database administrators have to deal with
routinely. We consider the problem of scheduling a workload of
report-generation queries. Our scheduling algorithms employ statistical performance models to schedule appropriate query mixes
for the given workload. Our experimental evaluation demonstrates that our interaction-aware scheduling algorithms outperform scheduling policies that are typically used in database systems. The problem of estimating the completion time of a workload is an important problem, and the state of the art does not offer any systematic solution. Typically database administrators rely on heuristics or observations of past behavior to solve this problem. We propose a more rigorous solution to this problem, based on a workload simulator that employs performance models to simulate the execution of the different mixes that make up a workload. This mix-based simulator provides a systematic tool that can help database administrators in estimating workload completion time. Our
experimental evaluation shows that our approach can estimate the workload completion times with a high degree of accuracy. Overall, this dissertation demonstrates that reasoning about query
interactions holds significant potential for realizing performance improvements in database systems. The techniques developed in this work can be viewed as initial steps in this interesting area of research, with lots of potential for future work
Evolving a secure grid-enabled, distributed data warehouse : a standards-based perspective
As digital data-collection has increased in scale and number, it becomes an important type of resource serving a wide community of researchers. Cross-institutional data-sharing and collaboration introduce a suitable approach to facilitate those research institutions that are suffering the lack of data and related IT infrastructures. Grid computing has become a widely adopted approach to enable cross-institutional resource-sharing and collaboration. It integrates a distributed and heterogeneous collection of locally managed users and resources. This project proposes a distributed data warehouse system, which uses Grid technology to enable data-access and integration, and collaborative operations across multi-distributed institutions in the context of HV/AIDS research. This study is based on wider research into OGSA-based Grid services architecture, comprising a data-analysis system which utilizes a data warehouse, data marts, and near-line operational database that are hosted by distributed institutions. Within this framework, specific patterns for collaboration, interoperability, resource virtualization and security are included. The heterogeneous and dynamic nature of the Grid environment introduces a number of security challenges. This study also concerns a set of particular security aspects, including PKI-based authentication, single sign-on, dynamic delegation, and attribute-based authorization. These mechanisms, as supported by the Globus Toolkit’s Grid Security Infrastructure, are used to enable interoperability and establish trust relationship between various security mechanisms and policies within different institutions; manage credentials; and ensure secure interactions
Gewinnung, Verwaltung und Anwendung von Performance-Daten zur UnterstĂĽtzung des autonomen Datenbank-Tuning
In den letzten Jahrzehnten ist die Komplexität und Heterogenität von Informationssystemen rapide gestiegen. Die Folge ist, dass viele moderne IT-Systeme aufgrund ihrer heterogenen Architektur- und Applikationsvielfalt sehr kostenintensiv in der Entwicklung, fehleranfällig in der Nutzung und schwierig durch Administratoren kontrollier- bzw. konfigurierbar sind.
Initiativen wie das Autonomic Computing helfen, der steigenden Komplexität Herr zu werden, indem sie den „Problemfaktor Mensch“ entlasten und Technik nutzen, um Technik zu verwalten. Durch die Anpassung bzw. Erweiterung der System-Umgebung versuchen derartige Ansätze neben derzeitiger manueller, reaktiver Performance-Optimierung, eine automatisierte reaktive und proaktive Performance-Kontrolle zu gewährleisten.
Zentrale Grundvoraussetzung für eine autonome Infrastruktur ist eine verlässliche, globale Daten- bzw. Wissensbasis. Wir erarbeiten, wie Performance-Daten über das Verhalten und den Zustand des Systems mit aus dem Data-Warehousing bekannten Techniken gesammelt, konsolidiert, verwaltet und zur Laufzeit ausgewertet werden können. Neben der Architektur und den funktionalen Komponenten eines solchen Performance Data Warehouse wird zudem dessen Datenmodell erläutert und die Anbindung an das vorausgehende Monitoring sowie die nachfolgende Analyse spezifiziert.
Mit dem Ziel, die menschliche Vorgehensweise „nachzuahmen“ und somit die Administratoren bei ihren Routine-Tätigkeiten zu entlasten, widmen wir uns der Konzipierung und Beschreibung einer möglichen Infrastruktur zur Automatisierung typischer Tuning-Aufgaben. Wir erarbeiten allgemein und anhand von Beispielen, wie Tuning-Wissen und bewährte Praktiken von DBAs abgebildet, in Form von Workflows formalisiert und zur Laufzeit für die Problemlösung angewendet werden können