2,633 research outputs found
Classifying the Correctness of Generated White-Box Tests: An Exploratory Study
White-box test generator tools rely only on the code under test to select
test inputs, and capture the implementation's output as assertions. If there is
a fault in the implementation, it could get encoded in the generated tests.
Tool evaluations usually measure fault-detection capability using the number of
such fault-encoding tests. However, these faults are only detected, if the
developer can recognize that the encoded behavior is faulty. We designed an
exploratory study to investigate how developers perform in classifying
generated white-box test as faulty or correct. We carried out the study in a
laboratory setting with 54 graduate students. The tests were generated for two
open-source projects with the help of the IntelliTest tool. The performance of
the participants were analyzed using binary classification metrics and by
coding their observed activities. The results showed that participants
incorrectly classified a large number of both fault-encoding and correct tests
(with median misclassification rate 33% and 25% respectively). Thus the real
fault-detection capability of test generators could be much lower than
typically reported, and we suggest to take this human factor into account when
evaluating generated white-box tests.Comment: 13 pages, 7 figure
Improving Automatic Content Type Identification from a Data Set
Data file layout inference refers to building the structure and determining the metadata of a text file. The text files dealt within this research are personal information records that have a consistent structure. Traditionally, if the layout structure of a text file is unknown, the human user must undergo manual labor of identifying the metadata. This is inefficient and prone to error. Content-based oracles are the current state-of-the-art automation technology that attempts to solve the layout inference problem by using databases of known metadata. This paper builds upon the information and documentation of the content-based oracles, and improves the databases of the oracles through experimentation
Automating test oracles generation
Software systems play a more and more important role in our everyday life. Many relevant human activities nowadays involve the execution of a piece of software. Software has to be reliable to deliver the expected behavior, and assessing the quality of software is of primary importance to reduce the risk of runtime errors. Software testing is the most common quality assessing technique for software. Testing consists in running the system under test on a finite set of inputs, and checking the correctness of the results. Thoroughly testing a software system is expensive and requires a lot of manual work to define test inputs (stimuli used to trigger different software behaviors) and test oracles (the decision procedures checking the correctness of the results). Researchers have addressed the cost of testing by proposing techniques to automatically generate test inputs. While the generation of test inputs is well supported, there is no way to generate cost-effective test oracles: Existing techniques to produce test oracles are either too expensive to be applied in practice, or produce oracles with limited effectiveness that can only identify blatant failures like system crashes. Our intuition is that cost-effective test oracles can be generated using information produced as a byproduct of the normal development activities. The goal of this thesis is to create test oracles that can detect faults leading to semantic and non-trivial errors, and that are characterized by a reasonable generation cost. We propose two ways to generate test oracles, one derives oracles from the software redundancy and the other from the natural language comments that document the source code of software systems. We present a technique that exploits redundant sequences of method calls encoding the software redundancy to automatically generate test oracles named CCOracles. We describe how CCOracles are automatically generated, deployed, and executed. We prove the effectiveness of CCOracles by measuring their fault-finding effectiveness when combined with both automatically generated and hand-written test inputs. We also present Toradocu, a technique that derives executable specifications from Javadoc comments of Java constructors and methods. From such specifications, Toradocu generates test oracles that are then deployed into existing test suites to assess the outputs of given test inputs. We empirically evaluate Toradocu, showing that Toradocu accurately translates Javadoc comments into procedure specifications. We also show that Toradocu oracles effectively identify semantic faults in the SUT. CCOracles and Toradocu oracles stem from independent information sources and are complementary in the sense that they check different aspects of the system undertest
Detecting Floating-Point Errors via Atomic Conditions
This paper tackles the important, difficult problem of detecting program inputs that trigger large floating-point errors in numerical code. It introduces a novel, principled dynamic analysis that leverages the mathematically rigorously analyzed condition numbers for atomic numerical operations, which we call atomic conditions, to effectively guide the search for large floating-point errors. Compared with existing approaches, our work based on atomic conditions has several distinctive benefits: (1) it does not rely on high-precision implementations to act as approximate oracles, which are difficult to obtain in general and computationally costly; and (2) atomic conditions provide accurate, modular search guidance. These benefits in combination lead to a highly effective approach that detects more significant errors in real-world code (e.g., widely-used numerical library functions) and achieves several orders of speedups over the state-of-the-art, thus making error analysis significantly more practical. We expect the methodology and principles behind our approach to benefit other floating-point program analysis tasks such as debugging, repair and synthesis. To facilitate the reproduction of our work, we have made our implementation, evaluation data and results publicly available on GitHub at https://github.com/FP-Analysis/atomic-condition.ISSN:2475-142
The Oracle Problem in Software Testing: A Survey
Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the “test oracle problem”. Test oracle automation is important to remove a current bottleneck that inhibits greater overall test automation. Without test oracle automation, the human has to determine whether observed behaviour is correct. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance. All forms of test oracles, even the humble human, involve challenges of reducing cost and increasing benefit. This paper provides a comprehensive survey of current approaches to the test oracle problem and an analysis of trends in this important area of software testing research and practice
Automatic Test Generation for Space
The European Space Agency (ESA) uses an engine to perform tests in the Ground
Segment infrastructure, specially the Operational Simulator. This engine uses
many different tools to ensure the development of regression testing
infrastructure and these tests perform black-box testing to the C++ simulator
implementation. VST (VisionSpace Technologies) is one of the companies that
provides these services to ESA and they need a tool to infer automatically
tests from the existing C++ code, instead of writing manually scripts to
perform tests. With this motivation in mind, this paper explores automatic
testing approaches and tools in order to propose a system that satisfies VST
needs
Automatic Repair of Buggy If Conditions and Missing Preconditions with SMT
We present Nopol, an approach for automatically repairing buggy if conditions
and missing preconditions. As input, it takes a program and a test suite which
contains passing test cases modeling the expected behavior of the program and
at least one failing test case embodying the bug to be repaired. It consists of
collecting data from multiple instrumented test suite executions, transforming
this data into a Satisfiability Modulo Theory (SMT) problem, and translating
the SMT result -- if there exists one -- into a source code patch. Nopol
repairs object oriented code and allows the patches to contain nullness checks
as well as specific method calls.Comment: CSTVA'2014, India (2014
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Mobile developers face unique challenges when detecting and reporting crashes
in apps due to their prevailing GUI event-driven nature and additional sources
of inputs (e.g., sensor readings). To support developers in these tasks, we
introduce a novel, automated approach called CRASHSCOPE. This tool explores a
given Android app using systematic input generation, according to several
strategies informed by static and dynamic analyses, with the intrinsic goal of
triggering crashes. When a crash is detected, CRASHSCOPE generates an augmented
crash report containing screenshots, detailed crash reproduction steps, the
captured exception stack trace, and a fully replayable script that
automatically reproduces the crash on a target device(s). We evaluated
CRASHSCOPE's effectiveness in discovering crashes as compared to five
state-of-the-art Android input generation tools on 61 applications. The results
demonstrate that CRASHSCOPE performs about as well as current tools for
detecting crashes and provides more detailed fault information. Additionally,
in a study analyzing eight real-world Android app crashes, we found that
CRASHSCOPE's reports are easily readable and allow for reliable reproduction of
crashes by presenting more explicit information than human written reports.Comment: 12 pages, in Proceedings of 9th IEEE International Conference on
Software Testing, Verification and Validation (ICST'16), Chicago, IL, April
10-15, 2016, pp. 33-4
- …