53 research outputs found

    Extensions of an Empirical Automated Tuning Framework

    Get PDF
    Empirical auto-tuning has been successfully applied to scientific computing applications and web-based cluster servers over the last few years. However, few studies are focused on applying this method on optimizing the performance of database systems. In this thesis, we present a strategy that uses Active Harmony, an empirical automated tuning framework to optimize the throughput of PostgreSQL server by tuning its settings such as memory and buffer sizes. We used Nelder-Mead simplex method as the search engine, and we showed how our strategy performs compared to the hand-tuned and default results. Another part of this thesis focuses on using data from prior runs of auto-tuning. Prior data has been proved to be useful in many cases, such as modeling the search space or finding a good starting point for hill-climbing. We present several methods that were developed to manage the prior data in Active Harmony. Our intention was to provide tuners a complete set of information for their tuning tasks

    BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning

    Full text link
    An ever increasing number of configuration parameters are provided to system users. But many users have used one configuration setting across different workloads, leaving untapped the performance potential of systems. A good configuration setting can greatly improve the performance of a deployed system under certain workloads. But with tens or hundreds of parameters, it becomes a highly costly task to decide which configuration setting leads to the best performance. While such task requires the strong expertise in both the system and the application, users commonly lack such expertise. To help users tap the performance potential of systems, we present BestConfig, a system for automatically finding a best configuration setting within a resource limit for a deployed system under a given application workload. BestConfig is designed with an extensible architecture to automate the configuration tuning for general systems. To tune system configurations within a resource limit, we propose the divide-and-diverge sampling method and the recursive bound-and-search algorithm. BestConfig can improve the throughput of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce the running time of Hive join job by about 50% and that of Spark join job by about 80%, solely by configuration adjustment

    ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees

    Full text link
    To support the variety of Big Data use cases, many Big Data related systems expose a large number of user-specifiable configuration parameters. Highlighted in our experiments, a MySQL deployment with well-tuned configuration parameters achieves a peak throughput as 12 times much as one with the default setting. However, finding the best setting for the tens or hundreds of configuration parameters is mission impossible for ordinary users. Worse still, many Big Data applications require the support of multiple systems co-deployed in the same cluster. As these co-deployed systems can interact to affect the overall performance, they must be tuned together. Automatic configuration tuning with scalability guarantees (ACTS) is in need to help system users. Solutions to ACTS must scale to various systems, workloads, deployments, parameters and resource limits. Proposing and implementing an ACTS solution, we demonstrate that ACTS can benefit users not only in improving system performance and resource utilization, but also in saving costs and enabling fairer benchmarking

    PerfXplain: Debugging MapReduce Job Performance

    Full text link
    While users today have access to many tools that assist in performing large scale data analysis tasks, understanding the performance characteristics of their parallel computations, such as MapReduce jobs, remains difficult. We present PerfXplain, a system that enables users to ask questions about the relative performances (i.e., runtimes) of pairs of MapReduce jobs. PerfXplain provides a new query language for articulating performance queries and an algorithm for generating explanations from a log of past MapReduce job executions. We formally define the notion of an explanation together with three metrics, relevance, precision, and generality, that measure explanation quality. We present the explanation-generation algorithm based on techniques related to decision-tree building. We evaluate the approach on a log of past executions on Amazon EC2, and show that our approach can generate quality explanations, outperforming two naive explanation-generation methods.Comment: VLDB201

    Performance Tuning of Database Systems Using a Context-aware Approach

    Get PDF
    Database system performance problems have a cascading effect into all aspects of an enterprise application. Database vendors and application developers provide guidelines, best practices and even initial database settings for good performance. But database performance tuning is not a one-off task. Database administrators have to keep a constant eye on the database performance as the tuning work carried out earlier could be invalidated due to multitude of reasons. Before engaging in a performance tuning endeavor, a database administrator must prioritize which tuning tasks to carry out first. This prioritization is done based on which tuning action would yield highest performance benefit. However, this prediction may not always be accurate. Experiment-based performance tuning methodologies have been introduced as an alternative to prediction-based performance tuning approaches. Experimenting on a representative system similar to the production one allows a database administrator to accurately gauge the performance gain for a particular tuning task. In this paper we propose a novel approach to experiment-based performance tuning with the use of a context-aware application model. Using a proof-of-concept implementation we show how it could be used to automate the detection of performance changes, experiment creation and evaluate the performance tuning outcomes for mixed workload types through database configuration parameter changes

    CM-CASL: Comparison-based Performance Modeling of Software Systems via Collaborative Active and Semisupervised Learning

    Full text link
    Configuration tuning for large software systems is generally challenging due to the complex configuration space and expensive performance evaluation. Most existing approaches follow a two-phase process, first learning a regression-based performance prediction model on available samples and then searching for the configurations with satisfactory performance using the learned model. Such regression-based models often suffer from the scarcity of samples due to the enormous time and resources required to run a large software system with a specific configuration. Moreover, previous studies have shown that even a highly accurate regression-based model may fail to discern the relative merit between two configurations, whereas performance comparison is actually one fundamental strategy for configuration tuning. To address these issues, this paper proposes CM-CASL, a Comparison-based performance Modeling approach for software systems via Collaborative Active and Semisupervised Learning. CM-CASL learns a classification model that compares the performance of two given configurations, and enhances the samples through a collaborative labeling process by both human experts and classifiers using an integration of active and semisupervised learning. Experimental results demonstrate that CM-CASL outperforms two state-of-the-art performance modeling approaches in terms of both classification accuracy and rank accuracy, and thus provides a better performance model for the subsequent work of configuration tuning

    d-Simplexed : Adaptive Delaunay Triangulation or Performance Modeling and Prediction on Big Data Analytics

    Get PDF
    Big Data processing systems (e.g., Spark) have a number of resource configuration parameters, such as memory size, CPU allocation, and the number of running nodes. Regular users and even expert administrators struggle to understand the mutual relation between different parameter configurations and the overall performance of the system. In this paper, we address this challenge by proposing a performance prediction framework, called dd-Simplexed, to build performance models with varied configurable parameters on Spark. We take inspiration from the field of Computational Geometry to construct a d-dimensional mesh using Delaunay Triangulation over a selected set of features. From this mesh, we predict execution time for various feature configurations. To minimize the time and resources in building a bootstrap model with a large number of configuration values, we propose an adaptive sampling technique to allow us to collect as few training points as required. Our evaluation on a cluster of computers using WordCount, PageRank, Kmeans, and Join workloads in HiBench benchmarking suites shows that we can achieve less than 5% error rate for estimation accuracy by sampling less than 1% of data.Peer reviewe

    Context-Aware Framework for Performance Tuning via Multi-action Evaluation

    Get PDF
    Context-aware systems perform adaptive changes in several ways. One way is for the system developers to encompass all possible context changes in a context-aware application and embed them into the system. However, this may not suit situations where the system encounters unknown contexts. In such cases, system inferences and adaptive learning are used whereby the system executes one action and evaluates the outcome to self-adapts/self-learns based on that. Unfortunately, this iterative approach is time-consuming if high number of actions needs to be evaluated. By contrast, our framework for context-aware systems finds the best action for unknown context through concurrent multi-action evaluation and self-adaptation which reduces significantly the evolution time in comparison to the iterative approach. In our implementation we show how the context-aware multi-action system can be used for a context-aware evaluation for database performance tuning
    • …
    corecore