63,795 research outputs found
Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset
Defects4J is a large, peer-reviewed, structured dataset of real-world Java
bugs. Each bug in Defects4J is provided with a test suite and at least one
failing test case that triggers the bug. In this paper, we report on an
experiment to explore the effectiveness of automatic repair on Defects4J. The
result of our experiment shows that 47 bugs of the Defects4J dataset can be
automatically repaired by state-of- the-art repair. This sets a baseline for
future research on automatic repair for Java. We have manually analyzed 84
different patches to assess their real correctness. In total, 9 real Java bugs
can be correctly fixed with test-suite based repair. This analysis shows that
test-suite based repair suffers from under-specified bugs, for which trivial
and incorrect patches still pass the test suite. With respect to practical
applicability, it takes in average 14.8 minutes to find a patch. The experiment
was done on a scientific grid, totaling 17.6 days of computation time. All
their systems and experimental results are publicly available on Github in
order to facilitate future research on automatic repair
Software Verification and Graph Similarity for Automated Evaluation of Students' Assignments
In this paper we promote introducing software verification and control flow
graph similarity measurement in automated evaluation of students' programs. We
present a new grading framework that merges results obtained by combination of
these two approaches with results obtained by automated testing, leading to
improved quality and precision of automated grading. These two approaches are
also useful in providing a comprehensible feedback that can help students to
improve the quality of their programs We also present our corresponding tools
that are publicly available and open source. The tools are based on LLVM
low-level intermediate code representation, so they could be applied to a
number of programming languages. Experimental evaluation of the proposed
grading framework is performed on a corpus of university students' programs
written in programming language C. Results of the experiments show that
automatically generated grades are highly correlated with manually determined
grades suggesting that the presented tools can find real-world applications in
studying and grading
- …