92,183 research outputs found

    Dynamic Analysis can be Improved with Automatic Test Suite Refactoring

    Full text link
    Context: Developers design test suites to automatically verify that software meets its expected behaviors. Many dynamic analysis techniques are performed on the exploitation of execution traces from test cases. However, in practice, there is only one trace that results from the execution of one manually-written test case. Objective: In this paper, we propose a new technique of test suite refactoring, called B-Refactoring. The idea behind B-Refactoring is to split a test case into small test fragments, which cover a simpler part of the control flow to provide better support for dynamic analysis. Method: For a given dynamic analysis technique, our test suite refactoring approach monitors the execution of test cases and identifies small test cases without loss of the test ability. We apply B-Refactoring to assist two existing analysis tasks: automatic repair of if-statements bugs and automatic analysis of exception contracts. Results: Experimental results show that test suite refactoring can effectively simplify the execution traces of the test suite. Three real-world bugs that could previously not be fixed with the original test suite are fixed after applying B-Refactoring; meanwhile, exception contracts are better verified via applying B-Refactoring to original test suites. Conclusions: We conclude that applying B-Refactoring can effectively improve the purity of test cases. Existing dynamic analysis tasks can be enhanced by test suite refactoring

    An empirical investigation into branch coverage for C programs using CUTE and AUSTIN

    Get PDF
    Automated test data generation has remained a topic of considerable interest for several decades because it lies at the heart of attempts to automate the process of Software Testing. This paper reports the results of an empirical study using the dynamic symbolic-execution tool. CUTE, and a search based tool, AUSTIN on five non-trivial open source applications. The aim is to provide practitioners with an assessment of what can be achieved by existing techniques with little or no specialist knowledge and to provide researchers with baseline data against which to measure subsequent work. To achieve this, each tool is applied 'as is', with neither additional tuning nor supporting harnesses and with no adjustments applied to the subject programs under test. The mere fact that these tools can be applied 'out of the box' in this manner reflects the growing maturity of Automated test data generation. However, as might be expected, the study reveals opportunities for improvement and suggests ways to hybridize these two approaches that have hitherto been developed entirely independently. (C) 2010 Elsevier Inc. All rights reserved

    Finding The Lazy Programmer's Bugs

    Get PDF
    Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps biased by what they believe to be the current boundary conditions of the function being tested. Or at least, they were supposed to. A major step forward was the development of property testing. Property testing requires the user to write a few functional properties that are used to generate tests, and requires an external library or tool to create test data for the tests. As such many thousands of tests can be created for a single property. For the purely functional programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck and Lazy SmallCheck [RNL08]. Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may silently insert runtime exceptions for incomplete pattern matches. We attempt to automate the testing process using these implicit tests. Our contributions are in four main areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed to generate test data without requiring additional programmer work or annotations. (2) To combine the constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type of test data at its most general, in order to prevent committing too early to monomorphic types that cause needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make our test data generation algorithm more expressive. In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high coverage test suites and detect common programming errors in the process

    Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J

    Full text link
    Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "for which bugs is my technique effective?" depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques' results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.Comment: Accepted for SANER'18 (25th edition of IEEE International Conference on Software Analysis, Evolution and Reengineering), Campobasso, Ital

    IntRepair: Informed Repairing of Integer Overflows

    Full text link
    Integer overflows have threatened software applications for decades. Thus, in this paper, we propose a novel technique to provide automatic repairs of integer overflows in C source code. Our technique, based on static symbolic execution, fuses detection, repair generation and validation. This technique is implemented in a prototype named IntRepair. We applied IntRepair to 2,052C programs (approx. 1 million lines of code) contained in SAMATE's Juliet test suite and 50 synthesized programs that range up to 20KLOC. Our experimental results show that IntRepair is able to effectively detect integer overflows and successfully repair them, while only increasing the source code (LOC) and binary (Kb) size by around 1%, respectively. Further, we present the results of a user study with 30 participants which shows that IntRepair repairs are more than 10x efficient as compared to manually generated code repairsComment: Accepted for publication at the IEEE TSE journal. arXiv admin note: text overlap with arXiv:1710.0372

    Where and Who? Automatic Semantic-Aware Person Composition

    Full text link
    Image compositing is a method used to generate realistic yet fake imagery by inserting contents from one image to another. Previous work in compositing has focused on improving appearance compatibility of a user selected foreground segment and a background image (i.e. color and illumination consistency). In this work, we instead develop a fully automated compositing model that additionally learns to select and transform compatible foreground segments from a large collection given only an input image background. To simplify the task, we restrict our problem by focusing on human instance composition, because human segments exhibit strong correlations with their background and because of the availability of large annotated data. We develop a novel branching Convolutional Neural Network (CNN) that jointly predicts candidate person locations given a background image. We then use pre-trained deep feature representations to retrieve person instances from a large segment database. Experimental results show that our model can generate composite images that look visually convincing. We also develop a user interface to demonstrate the potential application of our method.Comment: 10 pages, 9 figure

    SmartUnit: Empirical Evaluations for Automated Unit Testing of Embedded Software in Industry

    Full text link
    In this paper, we aim at the automated unit coverage-based testing for embedded software. To achieve the goal, by analyzing the industrial requirements and our previous work on automated unit testing tool CAUT, we rebuild a new tool, SmartUnit, to solve the engineering requirements that take place in our partner companies. SmartUnit is a dynamic symbolic execution implementation, which supports statement, branch, boundary value and MC/DC coverage. SmartUnit has been used to test more than one million lines of code in real projects. For confidentiality motives, we select three in-house real projects for the empirical evaluations. We also carry out our evaluations on two open source database projects, SQLite and PostgreSQL, to test the scalability of our tool since the scale of the embedded software project is mostly not large, 5K-50K lines of code on average. From our experimental results, in general, more than 90% of functions in commercial embedded software achieve 100% statement, branch, MC/DC coverage, more than 80% of functions in SQLite achieve 100% MC/DC coverage, and more than 60% of functions in PostgreSQL achieve 100% MC/DC coverage. Moreover, SmartUnit is able to find the runtime exceptions at the unit testing level. We also have reported exceptions like array index out of bounds and divided-by-zero in SQLite. Furthermore, we analyze the reasons of low coverage in automated unit testing in our setting and give a survey on the situation of manual unit testing with respect to automated unit testing in industry.Comment: In Proceedings of 40th International Conference on Software Engineering: Software Engineering in Practice Track, Gothenburg, Sweden, May 27-June 3, 2018 (ICSE-SEIP '18), 10 page
    • …
    corecore