915 research outputs found
Semantics-based Automated Web Testing
We present TAO, a software testing tool performing automated test and oracle
generation based on a semantic approach. TAO entangles grammar-based test
generation with automated semantics evaluation using a denotational semantics
framework. We show how TAO can be incorporated with the Selenium automation
tool for automated web testing, and how TAO can be further extended to support
automated delta debugging, where a failing web test script can be
systematically reduced based on grammar-directed strategies. A real-life
parking website is adopted throughout the paper to demonstrate the effectivity
of our semantics-based web testing approach.Comment: In Proceedings WWV 2015, arXiv:1508.0338
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
Identifying Patch Correctness in Test-Based Program Repair
Test-based automatic program repair has attracted a lot of attention in
recent years. However, the test suites in practice are often too weak to
guarantee correctness and existing approaches often generate a large number of
incorrect patches.
To reduce the number of incorrect patches generated, we propose a novel
approach that heuristically determines the correctness of the generated
patches. The core idea is to exploit the behavior similarity of test case
executions. The passing tests on original and patched programs are likely to
behave similarly while the failing tests on original and patched programs are
likely to behave differently. Also, if two tests exhibit similar runtime
behavior, the two tests are likely to have the same test results. Based on
these observations, we generate new test inputs to enhance the test suites and
use their behavior similarity to determine patch correctness.
Our approach is evaluated on a dataset consisting of 139 patches generated
from existing program repair systems including jGenProg, Nopol, jKali, ACS and
HDRepair. Our approach successfully prevented 56.3\% of the incorrect patches
to be generated, without blocking any correct patches.Comment: ICSE 201
Classifying the Correctness of Generated White-Box Tests: An Exploratory Study
White-box test generator tools rely only on the code under test to select
test inputs, and capture the implementation's output as assertions. If there is
a fault in the implementation, it could get encoded in the generated tests.
Tool evaluations usually measure fault-detection capability using the number of
such fault-encoding tests. However, these faults are only detected, if the
developer can recognize that the encoded behavior is faulty. We designed an
exploratory study to investigate how developers perform in classifying
generated white-box test as faulty or correct. We carried out the study in a
laboratory setting with 54 graduate students. The tests were generated for two
open-source projects with the help of the IntelliTest tool. The performance of
the participants were analyzed using binary classification metrics and by
coding their observed activities. The results showed that participants
incorrectly classified a large number of both fault-encoding and correct tests
(with median misclassification rate 33% and 25% respectively). Thus the real
fault-detection capability of test generators could be much lower than
typically reported, and we suggest to take this human factor into account when
evaluating generated white-box tests.Comment: 13 pages, 7 figure
Automatically Discovering, Reporting and Reproducing Android Application Crashes
Mobile developers face unique challenges when detecting and reporting crashes
in apps due to their prevailing GUI event-driven nature and additional sources
of inputs (e.g., sensor readings). To support developers in these tasks, we
introduce a novel, automated approach called CRASHSCOPE. This tool explores a
given Android app using systematic input generation, according to several
strategies informed by static and dynamic analyses, with the intrinsic goal of
triggering crashes. When a crash is detected, CRASHSCOPE generates an augmented
crash report containing screenshots, detailed crash reproduction steps, the
captured exception stack trace, and a fully replayable script that
automatically reproduces the crash on a target device(s). We evaluated
CRASHSCOPE's effectiveness in discovering crashes as compared to five
state-of-the-art Android input generation tools on 61 applications. The results
demonstrate that CRASHSCOPE performs about as well as current tools for
detecting crashes and provides more detailed fault information. Additionally,
in a study analyzing eight real-world Android app crashes, we found that
CRASHSCOPE's reports are easily readable and allow for reliable reproduction of
crashes by presenting more explicit information than human written reports.Comment: 12 pages, in Proceedings of 9th IEEE International Conference on
Software Testing, Verification and Validation (ICST'16), Chicago, IL, April
10-15, 2016, pp. 33-4
Automatic Repair of Buggy If Conditions and Missing Preconditions with SMT
We present Nopol, an approach for automatically repairing buggy if conditions
and missing preconditions. As input, it takes a program and a test suite which
contains passing test cases modeling the expected behavior of the program and
at least one failing test case embodying the bug to be repaired. It consists of
collecting data from multiple instrumented test suite executions, transforming
this data into a Satisfiability Modulo Theory (SMT) problem, and translating
the SMT result -- if there exists one -- into a source code patch. Nopol
repairs object oriented code and allows the patches to contain nullness checks
as well as specific method calls.Comment: CSTVA'2014, India (2014
A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
At ICSE'2013, there was the first session ever dedicated to automatic program
repair. In this session, Kim et al. presented PAR, a novel template-based
approach for fixing Java bugs. We strongly disagree with key points of this
paper. Our critical review has two goals. First, we aim at explaining why we
disagree with Kim and colleagues and why the reasons behind this disagreement
are important for research on automatic software repair in general. Second, we
aim at contributing to the field with a clarification of the essential ideas
behind automatic software repair. In particular we discuss the main evaluation
criteria of automatic software repair: understandability, correctness and
completeness. We show that depending on how one sets up the repair scenario,
the evaluation goals may be contradictory. Eventually, we discuss the nature of
fix acceptability and its relation to the notion of software correctness.Comment: ICSE 2014, India (2014
Mapping the Structure and Evolution of Software Testing Research Over the Past Three Decades
Background: The field of software testing is growing and rapidly-evolving.
Aims: Based on keywords assigned to publications, we seek to identify
predominant research topics and understand how they are connected and have
evolved.
Method: We apply co-word analysis to map the topology of testing research as
a network where author-assigned keywords are connected by edges indicating
co-occurrence in publications. Keywords are clustered based on edge density and
frequency of connection. We examine the most popular keywords, summarize
clusters into high-level research topics, examine how topics connect, and
examine how the field is changing.
Results: Testing research can be divided into 16 high-level topics and 18
subtopics. Creation guidance, automated test generation, evolution and
maintenance, and test oracles have particularly strong connections to other
topics, highlighting their multidisciplinary nature. Emerging keywords relate
to web and mobile apps, machine learning, energy consumption, automated program
repair and test generation, while emerging connections have formed between web
apps, test oracles, and machine learning with many topics. Random and
requirements-based testing show potential decline.
Conclusions: Our observations, advice, and map data offer a deeper
understanding of the field and inspiration regarding challenges and connections
to explore.Comment: To appear, Journal of Systems and Softwar
- …