1,128 research outputs found

    BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning

    Full text link
    An ever increasing number of configuration parameters are provided to system users. But many users have used one configuration setting across different workloads, leaving untapped the performance potential of systems. A good configuration setting can greatly improve the performance of a deployed system under certain workloads. But with tens or hundreds of parameters, it becomes a highly costly task to decide which configuration setting leads to the best performance. While such task requires the strong expertise in both the system and the application, users commonly lack such expertise. To help users tap the performance potential of systems, we present BestConfig, a system for automatically finding a best configuration setting within a resource limit for a deployed system under a given application workload. BestConfig is designed with an extensible architecture to automate the configuration tuning for general systems. To tune system configurations within a resource limit, we propose the divide-and-diverge sampling method and the recursive bound-and-search algorithm. BestConfig can improve the throughput of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce the running time of Hive join job by about 50% and that of Spark join job by about 80%, solely by configuration adjustment

    Quantifying and Predicting the Influence of Execution Platform on Software Component Performance

    Get PDF
    The performance of software components depends on several factors, including the execution platform on which the software components run. To simplify cross-platform performance prediction in relocation and sizing scenarios, a novel approach is introduced in this thesis which separates the application performance profile from the platform performance profile. The approach is evaluated using transparent instrumentation of Java applications and with automated benchmarks for Java Virtual Machines

    Achieving Replicability: Is There Life for Our Experiments After Publication?

    Get PDF
    Metaheuristics are algorithmic schemes that ease the derivation of novel algorithms to solve optimization problems. These algorithms are typically approximated and stochastic, leading to the preeminence of experimentation as the mean of supporting claims in research and applications. However, the huge number of variants and parameters of most metaheuristics, the ambiguity of natural language used in papers, and the lack of widely accepted reporting standards threatens the replicability of those experiments. This problem, that has been identified in the literature by several authors, significantly hinders the construction of a complete and cohesive body of knowledge on the behavior of metaheuristics. This paper proposes a set of minimum information guidelines for reporting metaheuristic experiments, and an experiment description language that supports the meeting of those guidelines. By using this language, metaheuristic optimization experiments are described in a toolindependent and unambiguous way, while maintaining readability and succinctness. Those contributions pave the way for replication using different problem instances and parameters, bringing a new life to metaheuristic experiments after publication.Ministerio de Ciencia e Innovación TIN2009-07366Ministerio de Economía y Competitividad TIN2012-32273Junta de Andalucía P07-TIC-2533Junta de Andalucía TIC-590

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Is One Hyperparameter Optimizer Enough?

    Full text link
    Hyperparameter tuning is the black art of automatically finding a good combination of control parameters for a data miner. While widely applied in empirical Software Engineering, there has not been much discussion on which hyperparameter tuner is best for software analytics. To address this gap in the literature, this paper applied a range of hyperparameter optimizers (grid search, random search, differential evolution, and Bayesian optimization) to defect prediction problem. Surprisingly, no hyperparameter optimizer was observed to be `best' and, for one of the two evaluation measures studied here (F-measure), hyperparameter optimization, in 50\% cases, was no better than using default configurations. We conclude that hyperparameter optimization is more nuanced than previously believed. While such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset.Comment: 7 pages, 2 columns, accepted for SWAN1

    Performance Problem Diagnostics by Systematic Experimentation

    Get PDF
    Diagnostics of performance problems requires deep expertise in performance engineering and entails a high manual effort. As a consequence, performance evaluations are postponed to the last minute of the development process. In this thesis, we introduce an automatic, experiment-based approach for performance problem diagnostics in enterprise software systems. With this approach, performance engineers can concentrate on their core competences instead of conducting repeating tasks

    Evaluating and Improving the Efficiency of Software and Algorithms for Sequence Data Analysis

    Get PDF
    With the ever-growing size of sequence data sets, data processing and analysis are an increasingly large portion of the time and money spent on nucleic acid sequencing projects. Correspondingly, the performance of the software and algorithms used to perform that analysis has a direct effect on the time and expense involved. Although the analytical methods are widely varied, certain types of software and algorithms are applicable to a number of areas. Targeting improvements to these common elements has the potential for wide reaching rewards. This dissertation research consisted of several projects to characterize and improve upon the efficiency of several common elements of sequence data analysis software and algorithms. The first project sought to improve the efficiency of the short read mapping process, as mapping is the most time consuming step in many data analysis pipelines. The result was a new short read mapping algorithm and software, demonstrated to be more computationally efficient than existing software and enabling more of the raw data to be utilized. While developing this software, it was discovered that a widely used bioinformatics software library introduced a great deal of inefficiency into the application. Given the potential impact of similar libraries to other applications, and because little research had been done to evaluate library efficiency, the second project evaluated the efficiency of seven of the most popular bioinformatics software libraries, written in C++, Java, Python, and Perl. This evaluation showed that two of libraries written in the most popular language, Java, were an order of magnitude slower and used more memory than expected based on the language in which they were implemented. The third and final project, therefore, was the development of a new general-purpose bioinformatics software library for Java. This library, known as BioMojo, incorporated a new design approach resulting in vastly improved efficiency. Assessing the performance of this new library using the benchmark methods developed for the second project showed that BioMojo outperformed all of the other libraries across all benchmark tasks, being up to 30 times more CPU efficient than existing Java libraries
    corecore