53 research outputs found
Extensions of an Empirical Automated Tuning Framework
Empirical auto-tuning has been successfully applied to scientific computing applications and web-based cluster servers over the last few years. However, few studies are focused on applying this method on optimizing the performance of database systems. In this thesis, we present a strategy that uses Active Harmony, an empirical automated tuning framework to optimize the throughput of PostgreSQL server by tuning its settings such as memory and buffer sizes. We used Nelder-Mead simplex method as the search engine, and we showed how our strategy performs compared to the hand-tuned and default results.
Another part of this thesis focuses on using data from prior runs of auto-tuning. Prior data has been proved to be useful in many cases, such as modeling the search space or finding a good starting point for hill-climbing. We present several methods that were developed to manage the prior data in Active Harmony. Our intention was to provide tuners a complete set of information for their tuning tasks
BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning
An ever increasing number of configuration parameters are provided to system
users. But many users have used one configuration setting across different
workloads, leaving untapped the performance potential of systems. A good
configuration setting can greatly improve the performance of a deployed system
under certain workloads. But with tens or hundreds of parameters, it becomes a
highly costly task to decide which configuration setting leads to the best
performance. While such task requires the strong expertise in both the system
and the application, users commonly lack such expertise.
To help users tap the performance potential of systems, we present
BestConfig, a system for automatically finding a best configuration setting
within a resource limit for a deployed system under a given application
workload. BestConfig is designed with an extensible architecture to automate
the configuration tuning for general systems. To tune system configurations
within a resource limit, we propose the divide-and-diverge sampling method and
the recursive bound-and-search algorithm. BestConfig can improve the throughput
of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce
the running time of Hive join job by about 50% and that of Spark join job by
about 80%, solely by configuration adjustment
ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees
To support the variety of Big Data use cases, many Big Data related systems
expose a large number of user-specifiable configuration parameters. Highlighted
in our experiments, a MySQL deployment with well-tuned configuration parameters
achieves a peak throughput as 12 times much as one with the default setting.
However, finding the best setting for the tens or hundreds of configuration
parameters is mission impossible for ordinary users. Worse still, many Big Data
applications require the support of multiple systems co-deployed in the same
cluster. As these co-deployed systems can interact to affect the overall
performance, they must be tuned together. Automatic configuration tuning with
scalability guarantees (ACTS) is in need to help system users. Solutions to
ACTS must scale to various systems, workloads, deployments, parameters and
resource limits. Proposing and implementing an ACTS solution, we demonstrate
that ACTS can benefit users not only in improving system performance and
resource utilization, but also in saving costs and enabling fairer
benchmarking
PerfXplain: Debugging MapReduce Job Performance
While users today have access to many tools that assist in performing large
scale data analysis tasks, understanding the performance characteristics of
their parallel computations, such as MapReduce jobs, remains difficult. We
present PerfXplain, a system that enables users to ask questions about the
relative performances (i.e., runtimes) of pairs of MapReduce jobs. PerfXplain
provides a new query language for articulating performance queries and an
algorithm for generating explanations from a log of past MapReduce job
executions. We formally define the notion of an explanation together with three
metrics, relevance, precision, and generality, that measure explanation
quality. We present the explanation-generation algorithm based on techniques
related to decision-tree building. We evaluate the approach on a log of past
executions on Amazon EC2, and show that our approach can generate quality
explanations, outperforming two naive explanation-generation methods.Comment: VLDB201
Performance Tuning of Database Systems Using a Context-aware Approach
Database system performance problems have a cascading effect into all aspects of an enterprise application. Database vendors and application developers provide guidelines, best practices and even initial database settings for good
performance. But database performance tuning is not a one-off task. Database administrators have to keep a constant eye on the database performance as the tuning work carried out earlier could be invalidated due to multitude of reasons. Before engaging in a performance tuning endeavor, a database administrator must prioritize which tuning tasks to carry out first. This prioritization is done based on which tuning action would yield highest performance benefit. However, this prediction may not always be accurate. Experiment-based performance tuning methodologies have been introduced as an alternative to prediction-based performance tuning approaches. Experimenting on a representative system similar to the production one allows a database administrator to accurately gauge the performance gain for a particular tuning task. In this paper we propose a novel approach to experiment-based performance tuning with the use of a context-aware application model. Using a proof-of-concept implementation we show how it could be used to automate the detection of performance changes, experiment creation and evaluate the performance tuning outcomes for mixed workload types through database configuration parameter changes
CM-CASL: Comparison-based Performance Modeling of Software Systems via Collaborative Active and Semisupervised Learning
Configuration tuning for large software systems is generally challenging due
to the complex configuration space and expensive performance evaluation. Most
existing approaches follow a two-phase process, first learning a
regression-based performance prediction model on available samples and then
searching for the configurations with satisfactory performance using the
learned model. Such regression-based models often suffer from the scarcity of
samples due to the enormous time and resources required to run a large software
system with a specific configuration. Moreover, previous studies have shown
that even a highly accurate regression-based model may fail to discern the
relative merit between two configurations, whereas performance comparison is
actually one fundamental strategy for configuration tuning. To address these
issues, this paper proposes CM-CASL, a Comparison-based performance Modeling
approach for software systems via Collaborative Active and Semisupervised
Learning. CM-CASL learns a classification model that compares the performance
of two given configurations, and enhances the samples through a collaborative
labeling process by both human experts and classifiers using an integration of
active and semisupervised learning. Experimental results demonstrate that
CM-CASL outperforms two state-of-the-art performance modeling approaches in
terms of both classification accuracy and rank accuracy, and thus provides a
better performance model for the subsequent work of configuration tuning
d-Simplexed : Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics
Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such as memory size, CPU allocation, and the number of running nodes. Regular users and even expert administrators struggle to understand the mutual relation between different parameter configurations and the overall performance of the system. In this paper, we address this challenge by proposing a performance prediction framework, called -Simplexed, to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for various feature configurations. To minimize the time and resources in building a bootstrap model with a large number of configuration values, we propose an adaptive sampling technique to allow us to collect as few training points as required. Our evaluation on a cluster of computers using WordCount, PageRank, Kmeans, and Join workloads in HiBench benchmarking suites shows that we can achieve less than 5% error rate for estimation accuracy by sampling less than 1% of data.Peer reviewe
Context-Aware Framework for Performance Tuning via Multi-action Evaluation
Context-aware systems perform adaptive changes in several ways. One way is for the system developers to encompass all possible context changes in a context-aware application and embed them into the system. However, this may not suit situations where the system encounters unknown contexts. In such cases, system inferences and adaptive learning are used whereby the system executes one action and evaluates the outcome to self-adapts/self-learns based on that. Unfortunately, this iterative approach is time-consuming if high number of actions needs to be evaluated. By contrast, our framework for context-aware systems finds the best action for unknown context through concurrent multi-action evaluation and self-adaptation which reduces significantly the evolution time in comparison to the iterative approach. In our implementation we show how the context-aware multi-action system can be used for a context-aware evaluation for database performance tuning
- …