12 research outputs found

    Automatic Generation of Load Tests

    Get PDF
    Load tests aim to validate whether system performance is acceptable under peak conditions. Existing test generation techniques induce load by increasing the size or rate of the input. Ignoring the particular input values, however, may lead to test suites that grossly mischaracterize a system’s performance. To address this limitation we introduce a mixed symbolic execution based approach that is unique in how it 1) favors program paths associated with a performance measure of interest, 2) operates in an iterative-deepening beam-search fashion to discard paths that are unlikely to lead to high-load tests, and 3) generates a test suite of a given size and level of diversity. An assessment of the approach shows it generates test suites that induce program response times and memory consumption several times worse than the compared alternatives, it scales to large and complex inputs, and it exposes a diversity of resource consuming program behavior

    Badger: Complexity Analysis with Fuzzing and Symbolic Execution

    Full text link
    Hybrid testing approaches that involve fuzz testing and symbolic execution have shown promising results in achieving high code coverage, uncovering subtle errors and vulnerabilities in a variety of software applications. In this paper we describe Badger - a new hybrid approach for complexity analysis, with the goal of discovering vulnerabilities which occur when the worst-case time or space complexity of an application is significantly higher than the average case. Badger uses fuzz testing to generate a diverse set of inputs that aim to increase not only coverage but also a resource-related cost associated with each path. Since fuzzing may fail to execute deep program paths due to its limited knowledge about the conditions that influence these paths, we complement the analysis with a symbolic execution, which is also customized to search for paths that increase the resource-related cost. Symbolic execution is particularly good at generating inputs that satisfy various program conditions but by itself suffers from path explosion. Therefore, Badger uses fuzzing and symbolic execution in tandem, to leverage their benefits and overcome their weaknesses. We implemented our approach for the analysis of Java programs, based on Kelinci and Symbolic PathFinder. We evaluated Badger on Java applications, showing that our approach is significantly faster in generating worst-case executions compared to fuzzing or symbolic execution on their own

    Performance regression testing of concurrent classes

    Full text link
    Developers of thread-safe classes struggle with two oppos-ing goals. The class must be correct, which requires syn-chronizing concurrent accesses, and the class should pro-vide reasonable performance, which is difficult to realize in the presence of unnecessary synchronization. Validating the performance of a thread-safe class is challenging because it requires diverse workloads that use the class, because ex-isting performance analysis techniques focus on individual bottleneck methods, and because reliably measuring the per-formance of concurrent executions is difficult. This paper presents SpeedGun, an automatic performance regression testing technique for thread-safe classes. The key idea is to generate multi-threaded performance tests and to com-pare two versions of a class with each other. The analysis notifies developers when changing a thread-safe class signif-icantly influences the performance of clients of this class. An evaluation with 113 pairs of classes from popular Java projects shows that the analysis effectively identifies 13 per-formance differences, including performance regressions that the respective developers were not aware of

    Performance mutation testing

    Get PDF
    Performance bugs are known to be a major threat to the success of software products. Performance tests aim to detect performance bugs by executing the program through test cases and checking whether it exhibits a noticeable performance degradation. The principles of mutation testing, a well-established testing technique for the assessment of test suites through the injection of artificial faults, could be exploited to evaluate and improve the detection power of performance tests. However, the application of mutation testing to assess performance tests, henceforth called performance mutation testing (PMT), is a novel research topic with numerous open challenges. In previous papers, we identified some key challenges related to PMT. In this work, we go a step further and explore the feasibility of applying PMT at the source-code level in general-purpose languages. To do so, we revisit concepts associated with classical mutation testing, and design seven novel mutation operators to model known bug-inducing patterns. As a proof of concept, we applied traditional mutation operators as well as performance mutation operators to open-source C++ programs. The results reveal the potential of the new performance-mutants to help assess and enhance performance tests when compared to traditional mutants. A review of live mutants in these programs suggests that they can induce the design of special test inputs. In addition to these promising results, our work brings a whole new set of challenges related to PMT, which will hopefully serve as a starting point for new contributions in the area

    Performance mutation testing

    Get PDF
    Performance bugs are known to be a major threat to the success of software products. Performance tests aim to detect performance bugs by executing the program through test cases and checking whether it exhibits a noticeable performance degradation. The principles of mutation testing, a well-established testing technique for the assessment of test suites through the injection of artificial faults, could be exploited to evaluate and improve the detection power of performance tests. However, the application of mutation testing to assess performance tests, henceforth called performance mutation testing (PMT), is a novel research topic with numerous open challenges. In previous papers, we identified some key challenges related to PMT. In this work, we go a step further and explore the feasibility of applying PMT at the source-code level in general purpose languages. To do so, we revisit concepts associated with classical mutation testing, and design seven novel mutation operators to model known bug-inducing patterns. As a proof of concept, we applied traditional mutation operators as well as performance mutation operators to open-source C++ programs. The results reveal the potential of the new performance-mutants to help assess and enhance performance tests when compared to traditional mutants. A review of live mutants in these programs suggests that they can induce the design of special test inputs. In addition to these promising results, our work brings a whole new set of challenges related to PMT, which will hopefully serve as a starting point for new contributions in the areaMinisterio de EconomĂ­a y Competitividad TIN2015-65845-C3-3-RMinisterio de EconomĂ­a y Competitividad RTI2018- 093608-B-C33Ministerio de EconomĂ­a y Competitividad BELI (TIN2015-70560-R)Ministerio de EconomĂ­a y Competitividad (HORATIO) RTI2018-101204-B-C2

    Reducing the Length of Field-replay Based Load Testing

    Get PDF
    With the development of software, load testing have become more and more important. Load testing can ensure the software system can provide quality service under a certain load. Therefore, one of the common challenges of load testing is to design realistic workloads that can represent the actual workload in the field. In particular, one of the most widely adopted and intuitive approaches is to directly replay the field workloads in the load testing environment, which is resource- and time-consuming. In this work, we propose an automated approach to reduce the length of load testing that is driven by replaying the field workloads. The intuition of our approach is: if the measured performance associated with a particular system behaviour is already stable, we can skip subsequent testing of this system behaviour to reduce the length of the field workloads. In particular, our approach first clusters execution logs that are generated during the system runtime to identify similar system behaviours during the field workloads. Then, we use statistical methods to determine whether the measured performance associated with a system behaviour has been stable. We evaluate our approach on three open-source projects (i.e., OpenMRS, TeaStore, and Apache James). The results show that our approach can significantly reduce the length of field workloads while the workloads-after-reduction produced by our approach are representative of the original set of workloads. More importantly, the load testing results obtained by replaying the workloads after the reduction have high correlation and similar trend with the original set of workloads. Practitioners can leverage our approach to perform realistic field-replay based load testing while saving the needed resources and time

    DASE: Document-Assisted Symbolic Execution for Improving Automated Test Generation

    Get PDF
    Software testing is crucial for uncovering software defects and ensuring software reliability. Symbolic execution has been utilized for automatic test generation to improve testing effectiveness. However, existing test generation techniques based on symbolic execution fail to take full advantage of programs’ rich amount of documentation specifying their input constraints, which can further enhance the effectiveness of test generation. In this paper we present a general approach, Document-Assisted Symbolic Execution (DASE), to improve automated test generation and bug detection. DASE leverages natural language processing techniques and heuristics to analyze programs’ readily available documentation and extract input constraints. The input constraints are then used as pruning criteria; inputs far from being valid are trimmed off. In this way, DASE guides symbolic execution to focus on those inputs that are semantically more important. We evaluated DASE on 88 programs from 5 mature real-world software suites: GNU Coreutils, GNU findutils, GNU grep, GNU Binutils, and elftoolchain. Compared to symbolic execution without input constraints, DASE increases line coverage, branch coverage, and call coverage by 5.27–22.10%, 5.83–21.25% and 2.81–21.43% respectively. In addition, DASE detected 13 previously unknown bugs, 6 of which have already been confirmed by the developers.1 yea

    CONFPROFITT: A CONFIGURATION-AWARE PERFORMANCE PROFILING, TESTING, AND TUNING FRAMEWORK

    Get PDF
    Modern computer software systems are complicated. Developers can change the behavior of the software system through software configurations. The large number of configuration option and their interactions make the task of software tuning, testing, and debugging very challenging. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as specific configurations. While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, many of these techniques are not effective in highly-configurable systems. To improve the non-functional qualities of configurable software systems, testing engineers need to be able to understand the performance influence of configuration options, adjust the performance of a system under different configurations, and detect configuration-related performance bugs. This research will provide an automated framework that allows engineers to effectively analyze performance-influence configuration options, detect performance bugs in highly-configurable software systems, and adjust configuration options to achieve higher long-term performance gains. To understand real-world performance bugs in highly-configurable software systems, we first perform a performance bug characteristics study from three large-scale opensource projects. Many researchers have studied the characteristics of performance bugs from the bug report but few have reported what the experience is when trying to replicate confirmed performance bugs from the perspective of non-domain experts such as researchers. This study is meant to report the challenges and potential workaround to replicate confirmed performance bugs. We also want to share a performance benchmark to provide real-world performance bugs to evaluate future performance testing techniques. Inspired by our performance bug study, we propose a performance profiling approach that can help developers to understand how configuration options and their interactions can influence the performance of a system. The approach uses a combination of dynamic analysis and machine learning techniques, together with configuration sampling techniques, to profile the program execution, analyze configuration options relevant to performance. Next, the framework leverages natural language processing and information retrieval techniques to automatically generate test inputs and configurations to expose performance bugs. Finally, the framework combines reinforcement learning and dynamic state reduction techniques to guide subject application towards achieving higher long-term performance gains

    Methodologies for Evaluating User Centric Performance of Mobile Network Applications

    Get PDF
    Performance is an important attribute of mobile software applications, having a direct impact on end-user's experience. One of the obstacles that make software performance testing difficult to pursue is the lack of performance requirements that complicates the process of verifying the correctness of the test case output. Moreover, compared to other platforms, mobile applications' quality assurance is more challenging, since their functionality is affected by the surrounding environment. In this work, we propose methodologies and frameworks to evaluate the impact of interaction of the quality of the wireless network connection and application configurations on performance behaviour and performance robustness of a mobile networked application as perceived by the end user. We follow a model-based approach. The thesis starts by defining the system model of software applications that we target, the network stack that the application is assumed to use to provide the service to the end user, and the metric used to capture the quality of the provided network service. Then, an analytical performance model that captures the application-network interactions is developed using the Markovian framework. To model realistic interactions with the network, the performance model is developed and solved using supplementary variable technique (SVT). The model is intensively verified with simulation. Furthermore, two input network models are analytically developed. In both models, the mobile application is assumed to have a wireless network access through a WiFi access point that implements IEEE 802.11 protocol. In the first model, data transfer is achieved using user datagram protocol (UDP), while in the second, data transfer is accomplished using transmission control protocol (TCP). For the TCP model, two scenarios are considered. In the first scenario, an application data unit (APDU) is assumed to fit in one TCP packet, while in the second scenario, an APDU is assumed to fit in multiple TCP packets. All models are verified using the well-known NS2 network simulator. Third, we propose a model based test generation methodology to evaluate the impact of the interaction of the environment, the wireless network, and the application configurations on the performance of a mobile networked application. The methodology requires four artefacts as inputs, namely, a behaviour model of the software under test, a network model, a test coverage criterion, and a set of desired performance levels. The methodology consists of three steps: performance model development, test generation, and estimation of test execution parameters. To evaluate the end-user quality of experience, test generation is formulated as an inversion problem and solved as an optimization problem. To generate an efficient set of test cases, two test coverage criteria are proposed: user experience (UE) and user experience and input interaction (UEII). Test execution optimizations are inferred using a performance simulation model. To show the applicability of the methodology, two mobile networked app examples are used: multimedia streaming and web browsing. The effectiveness of the methodology is evaluated by comparing the time cost to design a test suite with random testing. The obtained results are very promising. Fourth, to minimize the incurred cost of performance model evaluations, we utilize metamorphic testing to generate test cases. Metamorphic testing is a technique that is proposed to alleviate the test oracle problem. By utilizing certain inherent properties of the system under test (metamorphic relations), test cases are generated and verified without the need to know the expected output of each individual test case in advance. By hybridizing our proposed test generation methodology with metamorphic testing, the time cost of generating a test suite is reduced tremendously. We first generate a limited set of seed test cases using our test generation methodology. Then, we generate a set of follow-up test cases by utilizing the developed network models as metamorphic relations and without the need to invoke the performance model. Follow-up test generation is formulated as a maximization problem. The objective is to maximize the distance between a seed test case and follow-up test cases so that to generate a non-redundant set of test cases. Three distance metrics are used: Euclidean, squared Euclidean, and Manhattan. The modified methodology is used to generate test cases for a multimedia streaming application. We empirically evaluate the modified test generation methodology using two evaluation metrics: the incurred time cost and the percentage of redundancy in the generated test suite. The obtained results show the advantage of the modified methodology in minimizing the cost of test generation process. Fifth, we propose a third methodology to evaluate the impact of the wireless network conditions on robustness of performance of adaptive and non-adaptive mobile networked applications. Software robustness is mainly about how the system behaves under stressful conditions. In this work, we target performance robustness under stressful network conditions. The proposed methodology consists of three steps and it requires three different artefacts as inputs. To quantify robustness, two metrics (static and dynamic robustness) are proposed. The main challenge in evaluating robustness is the combinatorial growth of network-application interactions that need to be evaluated. To mitigate this issue, we propose an algorithm to limit the number of interactions, utilizing the monotonicity property of the performance model. To evaluate the dynamic robustness metric, the ability of the adaptive application to tolerate degraded network conditions has to be evaluated. This problem is formulated as a minimization problem. The methodology is used to evaluate the performance robustness of a mobile multimedia streaming application. The effectiveness of the proposed methodology is evaluated. The obtained results show three to five times reduction in total cost compared to the naive approach in which all combinations are exhaustively evaluated
    corecore