8 research outputs found

    Software Microbenchmarking in the Cloud. How Bad is it Really?

    Get PDF
    Rigorous performance engineering traditionally assumes measuring on bare-metal environments to control for as many confounding factors as possible. Unfortunately, some researchers and practitioners might not have access, knowledge, or funds to operate dedicated performance-testing hardware, making public clouds an attractive alternative. However, shared public cloud environments are inherently unpredictable in terms of the system performance they provide. In this study, we explore the effects of cloud environments on the variability of performance test results and to what extent slowdowns can still be reliably detected even in a public cloud. We focus on software microbenchmarks as an example of performance tests and execute extensive experiments on three different well-known public cloud services (AWS, GCE, and Azure) using three different cloud instance types per service. We also compare the results to a hosted bare-metal offering from IBM Bluemix. In total, we gathered more than 4.5 million unique microbenchmarking data points from benchmarks written in Java and Go. We find that the variability of results differs substantially between benchmarks and instance types (by a coefficient of variation from 0.03% to > 100%). However, executing test and control experiments on the same instances (in randomized order) allows us to detect slowdowns of 10% or less with high confidence, using state-of-the-art statistical tests (i.e., Wilcoxon rank-sum and overlapping bootstrapped confidence intervals). Finally, our results indicate that Wilcoxon rank-sum manages to detect smaller slowdowns in cloud environments

    Applying test case prioritization to software microbenchmarks

    Get PDF
    Regression testing comprises techniques which are applied during software evolution to uncover faults effectively and efficiently. While regression testing is widely studied for functional tests, performance regression testing, e.g., with software microbenchmarks, is hardly investigated. Applying test case prioritization (TCP), a regression testing technique, to software microbenchmarks may help capturing large performance regressions sooner upon new versions. This may especially be beneficial for microbenchmark suites, because they take considerably longer to execute than unit test suites. However, it is unclear whether traditional unit testing TCP techniques work equally well for software microbenchmarks. In this paper, we empirically study coverage-based TCP techniques, employing total and additional greedy strategies, applied to software microbenchmarks along multiple parameterization dimensions, leading to 54 unique technique instantiations. We find that TCP techniques have a mean APFD-P (average percentage of fault-detection on performance) effectiveness between 0.54 and 0.71 and are able to capture the three largest performance changes after executing 29% to 66% of the whole microbenchmark suite. Our efficiency analysis reveals that the runtime overhead of TCP varies considerably depending on the exact parameterization. The most effective technique has an overhead of 11% of the total microbenchmark suite execution time, making TCP a viable option for performance regression testing. The results demonstrate that the total strategy is superior to the additional strategy. Finally, dynamic-coverage techniques should be favored over static-coverage techniques due to their acceptable analysis overhead; however, in settings where the time for prioritzation is limited, static-coverage techniques provide an attractive alternative

    Towards the detection and analysis of performance regression introducing code changes

    Get PDF
    In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes

    Performance Test Selection Using Machine Learning and a Study of Binning Effect in Memory Allocators

    Get PDF
    Performance testing is an essential part of the development life cycle that must be done in a timely fashion. However, checking for performance regressions in software can be time-consuming, especially for complex systems containing multiple lengthy tests cases. The first part of this thesis presents a technique to performance test selection using machine learning. In our approach, we build features using information extracted from the previous software versions to train classifiers that assist developers in deciding whether or not to execute a performance test on a new version. Our results show that the classifiers can be used as a mechanism that aids test selection and consequently avoids unnecessary testing. The second part of this work investigates the binning effect on user-space memory allocators. First, we examine how binning events can be a source of performance outliers in Redis and CPython object allocators. Second, we implement a \textit{Pintool} to detect the occurrence of binning on Python programs. The tool performs dynamic binary instrumentation on the interpreter and outputs information that helps developers in performing code optimizations. Finally, we use our tool to investigate the presence of binning in various widely used Python libraries

    An industrial case study of automatically identifying performance regression-causes

    No full text

    Software Tracing Comparison Using Data Mining Techniques

    Get PDF
    La performance est devenue une question cruciale sur le dĂ©veloppement, le test et la maintenance des logiciels. Pour rĂ©pondre Ă  cette prĂ©occupation, les dĂ©veloppeurs et les testeurs utilisent plusieurs outils pour amĂ©liorer les performances ou suivre les bogues liĂ©s Ă  la performance. L’utilisation de mĂ©thodologies comparatives telles que Flame Graphs fournit un moyen formel de vĂ©rifier les causes des rĂ©gressions et des problĂšmes de performance. L’outil de comparaison fournit des informations pour l’analyse qui peuvent ĂȘtre utilisĂ©es pour les amĂ©liorer par un mĂ©canisme de profilage profond, comparant habituellement une donnĂ©e normale avec un profil anormal. D’autre part, le mĂ©canisme de traçage est un mĂ©canisme de tendance visant Ă  enregistrer des Ă©vĂ©nements dans le systĂšme et Ă  rĂ©duire les frais gĂ©nĂ©raux de son utilisation. Le registre de cette information peut ĂȘtre utilisĂ© pour fournir aux dĂ©veloppeurs des donnĂ©es pour l’analyse de performance. Cependant, la quantitĂ© de donnĂ©es fournies et les connaissances requises Ă  comprendre peuvent constituer un dĂ©fi pour les mĂ©thodes et les outils d’analyse actuels. La combinaison des deux mĂ©thodologies, un mĂ©canisme comparatif de profilage et un systĂšme de traçabilitĂ© peu Ă©levĂ© peut permettre d’évaluer les causes des problĂšmes rĂ©pondant Ă©galement Ă  des exigences de performance strictes en mĂȘme temps. La prochaine Ă©tape consiste Ă  utiliser ces donnĂ©es pour dĂ©velopper des mĂ©thodes d’analyse des causes profondes et d’identification des goulets d’étranglement. L’objectif de ce recherche est d’automatiser le processus d’analyse des traces et d’identifier automatiquement les diffĂ©rences entre les groupes d’exĂ©cutions. La solution prĂ©sentĂ©e souligne les diffĂ©rences dans les groupes prĂ©sentant une cause possible de cette diffĂ©rence, l’utilisateur peut alors bĂ©nĂ©ficier de cette revendication pour amĂ©liorer les exĂ©cutions. Nous prĂ©sentons une sĂ©rie de techniques automatisĂ©es qui peuvent ĂȘtre utilisĂ©es pour trouver les causes profondes des variations de performance et nĂ©cessitant des interfĂ©rences mineures ou non humaines. L’approche principale est capable d’indiquer la performance en utilisant une mĂ©thodologie de regroupement comparative sur les exĂ©cutions et a Ă©tĂ© appliquĂ©e sur des cas d’utilisation rĂ©elle. La solution proposĂ©e a Ă©tĂ© mise en oeuvre sur un cadre d’analyse pour aider les dĂ©veloppeurs Ă  rĂ©soudre des problĂšmes similaires avec un outil diffĂ©rentiel de flamme. À notre connaissance, il s’agit de la premiĂšre tentative de corrĂ©ler les mĂ©canismes de regroupement automatique avec l’analyse des causes racines Ă  l’aide des donnĂ©es de suivi. Dans ce projet, la plupart des donnĂ©es utilisĂ©es pour les Ă©valuations et les expĂ©riences ont Ă©tĂ© effectuĂ©es dans le systĂšme d’exploitation Linux et ont Ă©tĂ© menĂ©es Ă  l’aide de Linux Trace Toolkit Next Generation (LTTng) qui est un outil trĂšs flexible avec de faibles coĂ»ts gĂ©nĂ©raux.----------ABSTRACT: Performance has become a crucial matter in software development, testing and maintenance. To address this concern, developers and testers use several tools to improve the performance or track performance related bugs. The use of comparative methodologies such as Flame Graphs provides a formal way to verify causes of regressions and performance issues. The comparison tool provides information for analysis that can be used to improve the study by a deep profiling mechanism, usually comparing normal with abnormal profiling data. On the other hand, Tracing is a popular mechanism, targeting to record events in the system and to reduce the overhead associated with its utilization. The record of this information can be used to supply developers with data for performance analysis. However, the amount of data provided, and the required knowledge to understand it, may present a challenge for the current analysis methods and tools. Combining both methodologies, a comparative mechanism for profiling and a low overhead trace system, can enable the easier evaluation of issues and underlying causes, also meeting stringent performance requirements at the same time. The next step is to use this data to develop methods for root cause analysis and bottleneck identification. The objective of this research project is to automate the process of trace analysis and automatic identification of differences among groups of executions. The presented solution highlights differences in the groups, presenting a possible cause for any difference. The user can then benefit from this claim to improve the executions. We present a series of automated techniques that can be used to find the root causes of performance variations, while requiring small or no human intervention. The main approach is capable to identify the performance difference cause using a comparative grouping methodology on the executions, and was applied to real use cases. The proposed solution was implemented on an analysis framework to help developers with similar problems, together with a differential flame graph tool. To our knowledge, this is the first attempt to correlate automatic grouping mechanisms with root cause analysis using tracing data. In this project, most of the data used for evaluations and experiments were done with the Linux Operating System and were conducted using the Linux Trace Toolkit Next Generation (LTTng), which is a very flexible tool with low overhead
    corecore