6 research outputs found

    Towards Automated Performance Bug Identification in Python

    Full text link
    Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs

    Understanding the Impact of Databases on the Energy Efficiency of Cloud Applications

    Get PDF
    RÉSUMÉ Aujourd'hui, les applications infonuagiques sont utilisées dans toutes les industries ; de la finance, au commerce de détail, en passant par l'éducation, la communication, la manufacture, les services publics et les transports. Malgré leur popularité et leur large adoption, peu d'informations sont disponibles sur l'empreinte énergétique de ces applications et, en particulier, celle de leurs bases de données, qui constituent l'épine dorsale de ces applications infonuagiques. Pourtant, la réduction de la consommation d'énergie des applications est un objectif majeur pour la société et continuera de l'être à l'avenir. Deux familles de bases de données sont actuellement utilisées dans les applications infonuagiques: Les bases de données relationnelles et non-relationnelles. Aussi, nous examinons la consommation d'énergie des trois bases de données utilisées par les applications infonuagiques : MySQL, PostgreSQL et MongoDB, respectivement relationelle, relationelle, et non-relationelle. Nous réalisons une série d'expériences avec trois applications infonuagiques (une application multi-thread RESTful, DVD Store, et JPetStore). Nous étudions également l'impact des patrons infonuagiques sur la consommation d'énergie parce que les bases de données dans les applications infonuagiques sont souvent implémentées conjointement avec des patrons infonuagiques tels que le Local Database Proxy, le Local Sharding Based Router, ou la Priority Message Queue. Nous mesurons la consommation d'énergie en utilisant l'outil Power-API pour garder une trace de l'énergie consommée au niveau de processus par les variantes des applications infonuagiques. Cette estimation énergétique au niveau processus donne une précision plus exacte que d'une estimation au niveau d'un logiciel en général. En plus de cela, nous mesurons le temps de réponse de l'application infonuagique pour mettre en contraste le temps de réponse avec l'efficacité énergétique, afin que les développeurs soient conscients des compromis entre ces deux indicateurs de qualité lors de la sélection d'une base de données pour leur application. Nous rapportons que le choix des bases de données peut réduire la consommation d'énergie d'une application infonuagique quelque soit les trois types des patrons infonuagiques étudiés. Nous avons montré que la base de données MySQL est la moins consommatrice d'énergie, mais est la plus lente parmi les trois bases de données étudiées. PostgreSQL est la plus consommatrice d'énergie entre les trois bases de données, mais est plus rapide que MySQL, mais plus lente que MongoDB. MongoDB consomme plus d'énergie que MySQL, mais moins que PostgreSQL et est la plus rapide parmi les trois bases de données étudiées.----------ABSTRACT Cloud-based applications are used in about every industry; from financial, retail, education, and communication, to manufacturing, utilities, and transportation. Despite their popularity and wide adoption, little is still known about the energy footprint of these applications and, in particular, of their databases, which are the backbone of cloud-based applications. Reducing the energy consumption of applications is a major objective for society and will continue to be so in the near to far future. Two families of databases are currently used in cloud-based applications: relational and non-relational databases. Consequently, in this thesis, we study the energy consumption of three databases used by cloud-based applications: MySQL, PostgreSQL, and MongoDB, which are respectively relational, relational, and non-relational. We devise a series of experiments with three cloud-based applications (a RESTful multi-threaded application, DVD Store, and JPetStore). We also study the impact of cloud patterns on the energy consumption because databases in cloud-based applications are often implemented in conjunction with patterns like Local Database Proxy, Local Sharding-Based Router, and Priority Message Queue. We measure the energy consumption using the Power-API tool to keep track of the energy consumed at the process-level by the variants of the cloud-based applications. We measure the response time of the cloud-based application because we wanted to contrast response time with energy efficiency, so that developers are aware of the trade-offs between these two quality indicators when selecting a database for their application. We report that the choice of the databases can reduce the energy consumption of a cloud-based application regardless of the three cloud patterns that are implemented. We showed that MySQL database is the least energy consuming but is the slowest among the three databases. PostgreSQL is the most energy consuming among the three databases, but is faster than MySQL but slower than MongoDB. MongoDB consumes more energy than MySQL but less than PostgreSQL and is the fastest among the three databases

    CONFPROFITT: A CONFIGURATION-AWARE PERFORMANCE PROFILING, TESTING, AND TUNING FRAMEWORK

    Get PDF
    Modern computer software systems are complicated. Developers can change the behavior of the software system through software configurations. The large number of configuration option and their interactions make the task of software tuning, testing, and debugging very challenging. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as specific configurations. While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, many of these techniques are not effective in highly-configurable systems. To improve the non-functional qualities of configurable software systems, testing engineers need to be able to understand the performance influence of configuration options, adjust the performance of a system under different configurations, and detect configuration-related performance bugs. This research will provide an automated framework that allows engineers to effectively analyze performance-influence configuration options, detect performance bugs in highly-configurable software systems, and adjust configuration options to achieve higher long-term performance gains. To understand real-world performance bugs in highly-configurable software systems, we first perform a performance bug characteristics study from three large-scale opensource projects. Many researchers have studied the characteristics of performance bugs from the bug report but few have reported what the experience is when trying to replicate confirmed performance bugs from the perspective of non-domain experts such as researchers. This study is meant to report the challenges and potential workaround to replicate confirmed performance bugs. We also want to share a performance benchmark to provide real-world performance bugs to evaluate future performance testing techniques. Inspired by our performance bug study, we propose a performance profiling approach that can help developers to understand how configuration options and their interactions can influence the performance of a system. The approach uses a combination of dynamic analysis and machine learning techniques, together with configuration sampling techniques, to profile the program execution, analyze configuration options relevant to performance. Next, the framework leverages natural language processing and information retrieval techniques to automatically generate test inputs and configurations to expose performance bugs. Finally, the framework combines reinforcement learning and dynamic state reduction techniques to guide subject application towards achieving higher long-term performance gains

    An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments

    No full text
    Abstract—A key goal of performance testing is the detection of performance degradations (i.e., regressions) compared to previous releases. Prior research has proposed the automation of such analysis through the mining of historical performance data (e.g., CPU and memory usage) from prior test runs. Nevertheless, such research has had limited adoption in practice. Working with a large industrial performance testing lab, we noted that a major hurdle in the adoption of prior work (including our own work) is the incorrect assumption that prior tests are always executed in the same environment (i.e., labs). All too often, tests are performed in heterogenous environments with each test being run in a possibly different lab with different hardware and software configurations. To make automated performance regression analysis techniques work in industry, we propose to model the global expected behaviour of a system as an ensemble (combination) of individual models, one for each successful previous test run (and hence configuration). The ensemble of models of prior test runs are used to flag performance deviations (e.g., CPU counters showing higher usage) in new tests. The deviations are then aggregated using simple voting or more advanced weighting to determine whether the counters really deviate from the expected behaviour or whether it was simply due to an environment-specific variation. Case studies on two open-source systems and a very large scale industrial application show that our weighting approach outperforms a state-of-the-art environment-agnostic approach. Feedback from practitioners who used our approach over a 4 year period (across several major versions) has been very positive. I
    corecore