1,223 research outputs found
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
BigDansing
Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present BigDansing, a Big Data Cleansing system to tackle efficiency, scalability, and ease-of-use issues in data cleansing. The system can run on top of most common general purpose data processing platforms, ranging from DBMSs to MapReduce-like frameworks. A user-friendly programming interface allows users to express data quality rules both declaratively and procedurally, with no requirement of being aware of the underlying distributed platform. BigDansing takes these rules into a series of transformations that enable distributed computations and several optimizations, such as shared scans and specialized joins operators. Experimental results on both synthetic and real datasets show that BigDansing outperforms existing baseline systems up to more than two orders of magnitude without sacrificing the quality provided by the repair algorithms
GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction
Automated program repair (APR) aims to fix software bugs without human
intervention and template-based APR has been widely investigated with promising
results. However, it is challenging for template-based APR to select the
appropriate donor code, which is an important repair ingredient for generating
candidate patches. Inappropriate donor code may cause plausible but incorrect
patch generation even with correct fix patterns, limiting the repair
performance.
In this paper, we aim to revisit template-based APR, and propose GAMMA, to
directly leverage large pre-trained language models for donor code generation.
Our main insight is that instead of retrieving donor code in the local buggy
file, we can directly predict the correct code tokens based on the context code
snippets and repair patterns by a cloze task. Specifically, (1) GAMMA revises a
variety of fix templates from state-of-the-art template-based APR techniques
(i.e., TBar) and transforms them into mask patterns. (2) GAMMA adopts a
pre-trained language model to predict the correct code for masked code as a
fill-in-the-blank task. The experimental results demonstrate that GAMMA
correctly repairs 82 bugs on Defects4J-v1.2, which achieves 20.59\% (14 bugs)
and 26.15\% (17 bugs) improvement over the previous state-of-the-art
template-based approach TBar and learning-based one Recoder. Furthermore, GAMMA
repairs 45 bugs and 22 bugs from the additional Defects4J-v2.0 and QuixBugs,
indicating the generalizability of GAMMA in addressing the dataset overfitting
issue. We also prove that adopting other pre-trained language models can
provide substantial advancement, e.g., CodeBERT-based and ChatGPT-based GAMMA
is able to fix 80 and 67 bugs on Defects4J-v1.2, indicating the scalability of
GAMMA. Overall, our study highlights the promising future of adopting
pre-trained models to generate correct patches on top of fix patterns.Comment: Accepted to 38th IEEE/ACM International Conference on Automated
Software Engineering (ASE2023
Faults in Linux 2.6
In August 2011, Linux entered its third decade. Ten years before, Chou et al.
published a study of faults found by applying a static analyzer to Linux
versions 1.0 through 2.4.1. A major result of their work was that the drivers
directory contained up to 7 times more of certain kinds of faults than other
directories. This result inspired numerous efforts on improving the reliability
of driver code. Today, Linux is used in a wider range of environments, provides
a wider range of services, and has adopted a new development and release model.
What has been the impact of these changes on code quality? To answer this
question, we have transported Chou et al.'s experiments to all versions of
Linux 2.6; released between 2003 and 2011. We find that Linux has more than
doubled in size during this period, but the number of faults per line of code
has been decreasing. Moreover, the fault rate of drivers is now below that of
other directories, such as arch. These results can guide further development
and research efforts for the decade to come. To allow updating these results as
Linux evolves, we define our experimental protocol and make our checkers
available
- …