2,243 research outputs found

    Search-based crash reproduction using behavioural model seeding

    Get PDF
    Search-based crash reproduction approaches assist developers during debugging by generating a test case which reproduces a crash given its stack trace. One of the fundamental steps of this approach is creating objects needed to trigger the crash. One way to overcome this limitation is seeding: using information about the application during the search process. With seeding, the existing usages of classes can be used in the search process to produce realistic sequences of method calls which create the required objects. In this study, we introduce behavioral model seeding: a new seeding method which learns class usages from both the system under test and existing test cases. Learned usages are then synthesized in a behavioral model (state machine). Then, this model serves to guide the evolutionary process. To assess behavioral model-seeding, we evaluate it against test-seeding (the state-of-the-art technique for seeding realistic objects) and no-seeding (without seeding any class usage). For this evaluation, we use a benchmark of 124 hard-to-reproduce crashes stemming from six open-source projects. Our results indicate that behavioral model-seeding outperforms both test seeding and no-seeding by a minimum of 6% without any notable negative impact on efficiency

    Basic block coverage for search-based unit testing and crash reproduction

    Get PDF
    Search-based techniques have been widely used for white-box test generation. Many of these approaches rely on the approach level and branch distance heuristics to guide the search process and generate test cases with high line and branch coverage. Despite the positive results achieved by these two heuristics, they only use the information related to the coverage of explicit branches (e.g., indicated by conditional and loop statements), but ignore potential implicit branchings within basic blocks of code. If such implicit branching happens at runtime (e.g., if an exception is thrown in a branchless-method), the existing fitness functions cannot guide the search process. To address this issue, we introduce a new secondary objective, called Basic Block Coverage (BBC), which takes into account the coverage level of relevant basic blocks in the control flow graph. We evaluated the impact of BBC on search-based unit test generation (using the DynaMOSA algorithm) and search-based crash reproduction (using the STDistance and WeightedSum fitness functions). Our results show that for unit test generation, BBC improves the branch coverage of the generated tests. Although small (∼ 1.5%), this improvement in the branch coverage is systematic and leads to an increase of the output domain coverage and implicit runtime exception coverage, and of the diversity of runtime states. In terms of crash reproduction, in the combination of STDistance and WeightedSum, BBC helps in reproducing 3 new crashes for each fitness function. BBC significantly decreases the time required to reproduce 43.5% and 45.1% of the crashes using STDistance and WeightedSum, respectively. For these crashes, BBC reduces the consumed time by 71.7% (for STDistance) and 68.7% (for WeightedSum) on average.Software Engineerin

    Search-Based Test Generation Targeting Non-Functional Quality Attributes of Android Apps

    Get PDF
    Mobile apps form a major proportion of the software marketplace and it is crucial to ensure that they meet both functional and nonfunctional quality thresholds. Automated test input generation can reduce the cost of the testing process. However, existing Android test generation approaches are focused on code coverage and cannot be customized to a tester\u27s diverse goals---in particular, quality attributes such as resource use.We propose a flexible multi-objective search-based test generation framework for interface testing of Android apps---STGFA-SMOG. This framework allows testers to target a variety of fitness functions, corresponding to different software quality attributes, code coverage, and other test case properties. We find that STGFA-SMOG outperforms random test generation in exposing potential quality issues and triggering crashes. Our study also offers insights on how different combinations of fitness functions can affect test generation for Android apps

    Exploring means to facilitate software debugging

    Get PDF
    In this thesis, several aspects of software debugging from automated crash reproduction to bug report analysis and use of contracts have been studied.Algorithms and the Foundations of Software technolog

    Sapienz: Multi-objective automated testing for android applications

    Get PDF
    We introduce Sapienz, an approach to Android testing that uses multi-objective search-based testing to automatically explore and optimise test sequences, minimising length, while simultaneously maximising coverage and fault revelation. Sapienz combines random fuzzing, systematic and search-based exploration, exploiting seeding and multi-level instrumentation. Sapienz significantly outperforms (with large effect size) both the state-of-the-art technique Dynodroid and the widely-used tool, Android Monkey, in 7/10 experiments for coverage, 7/10 for fault detection and 10/10 for fault-revealing sequence length. When applied to the top 1, 000 Google Play apps, Sapienz found 558 unique, previously unknown crashes. So far we have managed to make contact with the developers of 27 crashing apps. Of these, 14 have confirmed that the crashes are caused by real faults. Of those 14, six already have developer-confirmed fixes

    Capture-based Automated Test Input Generation

    Get PDF
    Testing object-oriented software is critical because object-oriented languages have been commonly used in developing modern software systems. Many efficient test input generation techniques for object-oriented software have been proposed; however, state-of-the-art algorithms yield very low code coverage (e.g., less than 50%) on large-scale software. Therefore, one important and yet challenging problem is to generate desirable input objects for receivers and arguments that can achieve high code coverage (such as branch coverage) or help reveal bugs. Desirable objects help tests exercise the new parts of the code. However, generating desirable objects has been a significant challenge for automated test input generation tools, partly because the search space for such desirable objects is huge. To address this significant challenge, we propose a novel approach called Capture-based Automated Test Input Generation for Objected-Oriented Unit Testing (CAPTIG). The contributions of this proposed research are the following. First, CAPTIG enhances method-sequence generation techniques. Our approach intro-duces a set of new algorithms for guided input and method selection that increase code coverage. In addition, CAPTIG efficently reduces the amount of generated input. Second, CAPTIG captures objects dynamically from program execution during either system testing or real use. These captured inputs can support existing automated test input generation tools, such as a random testing tool called Randoop, to achieve higher code coverage. Third, CAPTIG statically analyzes the observed branches that had not been covered and attempts to exercise them by mutating existing inputs, based on the weakest precon-dition analysis. This technique also contributes to achieve higher code coverage. Fourth, CAPTIG can be used to reproduce software crashes, based on crash stack trace. This feature can considerably reduce cost for analyzing and removing causes of the crashes. In addition, each CAPTIG technique can be independently applied to leverage existing testing techniques. We anticipate our approach can achieve higher code coverage with a reduced duration of time with smaller amount of test input. To evaluate this new approach, we performed experiments with well-known large-scale open-source software and discovered our approach can help achieve higher code coverage with fewer amounts of time and test inputs

    Diversity-guided Search Exploration for Self-driving Cars Test Generation through Frenet Space Encoding

    Full text link
    The rise of self-driving cars (SDCs) presents important safety challenges to address in dynamic environments. While field testing is essential, current methods lack diversity in assessing critical SDC scenarios. Prior research introduced simulation-based testing for SDCs, with Frenetic, a test generation approach based on Frenet space encoding, achieving a relatively high percentage of valid tests (approximately 50%) characterized by naturally smooth curves. The "minimal out-of-bound distance" is often taken as a fitness function, which we argue to be a sub-optimal metric. Instead, we show that the likelihood of leading to an out-of-bound condition can be learned by the deep-learning vanilla transformer model. We combine this "inherently learned metric" with a genetic algorithm, which has been shown to produce a high diversity of tests. To validate our approach, we conducted a large-scale empirical evaluation on a dataset comprising over 1,174 simulated test cases created to challenge the SDCs behavior. Our investigation revealed that our approach demonstrates a substantial reduction in generating non-valid test cases, increased diversity, and high accuracy in identifying safety violations during SDC test execution

    Automated Software Transplantation

    Get PDF
    Automated program repair has excited researchers for more than a decade, yet it has yet to find full scale deployment in industry. We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide. In its first three months of operation, SAPFIX produced 55 repair candidates for 57 crashes reported to SAPFIX, of which 27 have been deem as correct by developers and 14 have been landed into production automatically by SAPFIX. SAPFIX has thus demonstrated the potential of the search-based repair research agenda by deploying, to hundreds of millions of users worldwide, software systems that have been automatically tested and repaired. Automated software transplantation (autotransplantation) is a form of automated software engineering, where we use search based software engineering to be able to automatically move a functionality of interest from a ‘donor‘ program that implements it into a ‘host‘ program that lacks it. Autotransplantation is a kind of automated program repair where we repair the ‘host‘ program by augmenting it with the missing functionality. Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system, potentially written in a different programming language. Being able to do so might greatly enhance the software engineering practice, while reducing the costs. Automated software transplantation manifests in two different flavors: monolingual, when the languages of the host and donor programs is the same, or multilingual when the languages differ. This thesis introduces a theory of automated software transplantation, and two algorithms implemented in two tools that achieve this: µSCALPEL for monolingual software transplantation and τSCALPEL for multilingual software transplantation. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. We report encouraging results: in 14 of 17 monolingual transplantation experiments involving 6 donors and 4 hosts, popular real-world systems, we successfully autotransplanted 6 new functionalities; and in 10 out of 10 multlingual transplantation experiments involving 10 donors and 10 hosts, popular real-world systems written in 4 different programming languages, we successfully autotransplanted 10 new functionalities. That is, we have passed all the test suites that validates the new functionalities behaviour and the fact that the initial program behaviour is preserved. Additionally, we have manually checked the behaviour exercised by the organ. Autotransplantation is also very useful: in just 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player, a task that is currently done manually by the developers of VLC, since 12 years ago. We autotransplanted call graph generation and indentation for C programs into Kate, (a popular KDE based test editor used as an IDE by a lot of C developers) two features currently missing from Kate, but requested by the users of Kate. Autotransplantation is also efficient: the total runtime across 15 monolingual transplants is 5 hours and a half; the total runtime across 10 multilingual transplants is 33 hours
    • …
    corecore