12,664 research outputs found
Software Verification and Graph Similarity for Automated Evaluation of Students' Assignments
In this paper we promote introducing software verification and control flow
graph similarity measurement in automated evaluation of students' programs. We
present a new grading framework that merges results obtained by combination of
these two approaches with results obtained by automated testing, leading to
improved quality and precision of automated grading. These two approaches are
also useful in providing a comprehensible feedback that can help students to
improve the quality of their programs We also present our corresponding tools
that are publicly available and open source. The tools are based on LLVM
low-level intermediate code representation, so they could be applied to a
number of programming languages. Experimental evaluation of the proposed
grading framework is performed on a corpus of university students' programs
written in programming language C. Results of the experiments show that
automatically generated grades are highly correlated with manually determined
grades suggesting that the presented tools can find real-world applications in
studying and grading
Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset
Defects4J is a large, peer-reviewed, structured dataset of real-world Java
bugs. Each bug in Defects4J is provided with a test suite and at least one
failing test case that triggers the bug. In this paper, we report on an
experiment to explore the effectiveness of automatic repair on Defects4J. The
result of our experiment shows that 47 bugs of the Defects4J dataset can be
automatically repaired by state-of- the-art repair. This sets a baseline for
future research on automatic repair for Java. We have manually analyzed 84
different patches to assess their real correctness. In total, 9 real Java bugs
can be correctly fixed with test-suite based repair. This analysis shows that
test-suite based repair suffers from under-specified bugs, for which trivial
and incorrect patches still pass the test suite. With respect to practical
applicability, it takes in average 14.8 minutes to find a patch. The experiment
was done on a scientific grid, totaling 17.6 days of computation time. All
their systems and experimental results are publicly available on Github in
order to facilitate future research on automatic repair
IntRepair: Informed Repairing of Integer Overflows
Integer overflows have threatened software applications for decades. Thus, in
this paper, we propose a novel technique to provide automatic repairs of
integer overflows in C source code. Our technique, based on static symbolic
execution, fuses detection, repair generation and validation. This technique is
implemented in a prototype named IntRepair. We applied IntRepair to 2,052C
programs (approx. 1 million lines of code) contained in SAMATE's Juliet test
suite and 50 synthesized programs that range up to 20KLOC. Our experimental
results show that IntRepair is able to effectively detect integer overflows and
successfully repair them, while only increasing the source code (LOC) and
binary (Kb) size by around 1%, respectively. Further, we present the results of
a user study with 30 participants which shows that IntRepair repairs are more
than 10x efficient as compared to manually generated code repairsComment: Accepted for publication at the IEEE TSE journal. arXiv admin note:
text overlap with arXiv:1710.0372
A Critical Review of "Automatic Patch Generation Learned from Human-Written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair
At ICSE'2013, there was the first session ever dedicated to automatic program
repair. In this session, Kim et al. presented PAR, a novel template-based
approach for fixing Java bugs. We strongly disagree with key points of this
paper. Our critical review has two goals. First, we aim at explaining why we
disagree with Kim and colleagues and why the reasons behind this disagreement
are important for research on automatic software repair in general. Second, we
aim at contributing to the field with a clarification of the essential ideas
behind automatic software repair. In particular we discuss the main evaluation
criteria of automatic software repair: understandability, correctness and
completeness. We show that depending on how one sets up the repair scenario,
the evaluation goals may be contradictory. Eventually, we discuss the nature of
fix acceptability and its relation to the notion of software correctness.Comment: ICSE 2014, India (2014
Automatic Repair of Buggy If Conditions and Missing Preconditions with SMT
We present Nopol, an approach for automatically repairing buggy if conditions
and missing preconditions. As input, it takes a program and a test suite which
contains passing test cases modeling the expected behavior of the program and
at least one failing test case embodying the bug to be repaired. It consists of
collecting data from multiple instrumented test suite executions, transforming
this data into a Satisfiability Modulo Theory (SMT) problem, and translating
the SMT result -- if there exists one -- into a source code patch. Nopol
repairs object oriented code and allows the patches to contain nullness checks
as well as specific method calls.Comment: CSTVA'2014, India (2014
Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J
Well-designed and publicly available datasets of bugs are an invaluable asset
to advance research fields such as fault localization and program repair as
they allow directly and fairly comparison between competing techniques and also
the replication of experiments. These datasets need to be deeply understood by
researchers: the answer for questions like "which bugs can my technique
handle?" and "for which bugs is my technique effective?" depends on the
comprehension of properties related to bugs and their patches. However, such
properties are usually not included in the datasets, and there is still no
widely adopted methodology for characterizing bugs and patches. In this work,
we deeply study 395 patches of the Defects4J dataset. Quantitative properties
(patch size and spreading) were automatically extracted, whereas qualitative
ones (repair actions and patterns) were manually extracted using a thematic
analysis-based approach. We found that 1) the median size of Defects4J patches
is four lines, and almost 30% of the patches contain only addition of lines; 2)
92% of the patches change only one file, and 38% has no spreading at all; 3)
the top-3 most applied repair actions are addition of method calls,
conditionals, and assignments, occurring in 77% of the patches; and 4) nine
repair patterns were found for 95% of the patches, where the most prevalent,
appearing in 43% of the patches, is on conditional blocks. These results are
useful for researchers to perform advanced analysis on their techniques'
results based on Defects4J. Moreover, our set of properties can be used to
characterize and compare different bug datasets.Comment: Accepted for SANER'18 (25th edition of IEEE International Conference
on Software Analysis, Evolution and Reengineering), Campobasso, Ital
- …