4,192 research outputs found
Automatic Software Repair: a Bibliography
This article presents a survey on automatic software repair. Automatic
software repair consists of automatically finding a solution to software bugs
without human intervention. This article considers all kinds of repairs. First,
it discusses behavioral repair where test suites, contracts, models, and
crashing inputs are taken as oracle. Second, it discusses state repair, also
known as runtime repair or runtime recovery, with techniques such as checkpoint
and restart, reconfiguration, and invariant restoration. The uniqueness of this
article is that it spans the research communities that contribute to this body
of knowledge: software engineering, dependability, operating systems,
programming languages, and security. It provides a novel and structured
overview of the diversity of bug oracles and repair operators used in the
literature
A Survey of Learning-based Automated Program Repair
Automated program repair (APR) aims to fix software bugs automatically and
plays a crucial role in software development and maintenance. With the recent
advances in deep learning (DL), an increasing number of APR techniques have
been proposed to leverage neural networks to learn bug-fixing patterns from
massive open-source code repositories. Such learning-based techniques usually
treat APR as a neural machine translation (NMT) task, where buggy code snippets
(i.e., source language) are translated into fixed code snippets (i.e., target
language) automatically. Benefiting from the powerful capability of DL to learn
hidden relationships from previous bug-fixing datasets, learning-based APR
techniques have achieved remarkable performance. In this paper, we provide a
systematic survey to summarize the current state-of-the-art research in the
learning-based APR community. We illustrate the general workflow of
learning-based APR techniques and detail the crucial components, including
fault localization, patch generation, patch ranking, patch validation, and
patch correctness phases. We then discuss the widely-adopted datasets and
evaluation metrics and outline existing empirical studies. We discuss several
critical aspects of learning-based APR techniques, such as repair domains,
industrial deployment, and the open science issue. We highlight several
practical guidelines on applying DL techniques for future APR studies, such as
exploring explainable patch generation and utilizing code features. Overall,
our paper can help researchers gain a comprehensive understanding about the
achievements of the existing learning-based APR techniques and promote the
practical application of these techniques. Our artifacts are publicly available
at \url{https://github.com/QuanjunZhang/AwesomeLearningAPR}
Out of Sight, Out of Mind:Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources
The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent structure, and 3) they do not tap into the valuable expert knowledge present in the expert system. To address this, we propose VulMaster, a Transformer-based neural network model that excels at generating vulnerability repairs by comprehensively understanding the entire vulnerable code, irrespective of its length. This model also integrates diverse information, encompassing vulnerable code structures and expert knowledge from the CWE system. We evaluated VulMaster on a real-world C/C++ vulnerability repair dataset comprising 1,754 projects with 5,800 vulnerable functions. The experimental results demonstrated that VulMaster exhibits substantial improvements compared to the learning-based state-of-the-art vulnerability repair approach. Specifically, VulMaster improves the EM, BLEU, and CodeBLEU scores from 10.2% to 20.0%, 21.3% to 29.3%, and 32.5% to 40.9%, respectively
StaticFixer: From Static Analysis to Static Repair
Static analysis tools are traditionally used to detect and flag programs that
violate properties. We show that static analysis tools can also be used to
perturb programs that satisfy a property to construct variants that violate the
property. Using this insight we can construct paired data sets of unsafe-safe
program pairs, and learn strategies to automatically repair property
violations. We present a system called \sysname, which automatically repairs
information flow vulnerabilities using this approach. Since information flow
properties are non-local (both to check and repair), \sysname also introduces a
novel domain specific language (DSL) and strategy learning algorithms for
synthesizing non-local repairs. We use \sysname to synthesize strategies for
repairing two types of information flow vulnerabilities, unvalidated dynamic
calls and cross-site scripting, and show that \sysname successfully repairs
several hundred vulnerabilities from open source {\sc JavaScript} repositories,
outperforming neural baselines built using {\sc CodeT5} and {\sc Codex}. Our
datasets can be downloaded from \url{http://aka.ms/StaticFixer}
Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models
With the recent advancement of Large Language Models (LLMs), generating
functionally correct code has become less complicated for a wide array of
developers. While using LLMs has sped up the functional development process, it
poses a heavy risk to code security. Code generation with proper security
measures using LLM is a significantly more challenging task than functional
code generation. Security measures may include adding a pair of lines of code
with the original code, consisting of null pointer checking or prepared
statements for SQL injection prevention. Currently, available code repair LLMs
generate code repair by supervised fine-tuning, where the model looks at
cross-entropy loss. However, the original and repaired codes are mostly similar
in functionality and syntactically, except for a few (1-2) lines, which act as
security measures. This imbalance between the lines needed for security
measures and the functional code enforces the supervised fine-tuned model to
prioritize generating functional code without adding proper security measures,
which also benefits the model by resulting in minimal loss. Therefore, in this
work, for security hardening and strengthening of generated code from LLMs, we
propose a reinforcement learning-based method for program-specific repair with
the combination of semantic and syntactic reward mechanisms that focus heavily
on adding security and functional measures in the code, respectively
- …