301 research outputs found
Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories
The lack of comprehensive sources of accurate vulnerability data represents a
critical obstacle to studying and understanding software vulnerabilities (and
their corrections). In this paper, we present an approach that combines
heuristics stemming from practical experience and machine-learning (ML) -
specifically, natural language processing (NLP) - to address this problem. Our
method consists of three phases. First, an advisory record containing key
information about a vulnerability is extracted from an advisory (expressed in
natural language). Second, using heuristics, a subset of candidate fix commits
is obtained from the source code repository of the affected project by
filtering out commits that are known to be irrelevant for the task at hand.
Finally, for each such candidate commit, our method builds a numerical feature
vector reflecting the characteristics of the commit that are relevant to
predicting its match with the advisory at hand. The feature vectors are then
exploited for building a final ranked list of candidate fixing commits. The
score attributed by the ML model to each feature is kept visible to the users,
allowing them to interpret of the predictions.
We evaluated our approach using a prototype implementation named Prospector
on a manually curated data set that comprises 2,391 known fix commits
corresponding to 1,248 public vulnerability advisories. When considering the
top-10 commits in the ranked results, our implementation could successfully
identify at least one fix commit for up to 84.03% of the vulnerabilities (with
a fix commit on the first position for 65.06% of the vulnerabilities). In
conclusion, our method reduces considerably the effort needed to search OSS
repositories for the commits that fix known vulnerabilities
Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation
Large language models (LLMs) have brought significant advancements to code
generation, benefiting both novice and experienced developers. However, their
training using unsanitized data from open-source repositories, like GitHub,
introduces the risk of inadvertently propagating security vulnerabilities. To
effectively mitigate this concern, this paper presents a comprehensive study
focused on evaluating and enhancing code LLMs from a software security
perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as
supplemental material and will be made publicly available after publication.},
a meticulously curated dataset targeting 21 critical vulnerability types.
SecuCoGen comprises 180 samples and serves as the foundation for conducting
experiments on three crucial code-related tasks: code generation, code repair
and vulnerability classification, with a strong emphasis on security. Our
experimental results reveal that existing models often overlook security
concerns during code generation, leading to the generation of vulnerable code.
To address this, we propose effective approaches to mitigate the security
vulnerabilities and enhance the overall robustness of code generated by LLMs.
Moreover, our study identifies weaknesses in existing models' ability to repair
vulnerable code, even when provided with vulnerability information.
Additionally, certain vulnerability types pose challenges for the models,
hindering their performance in vulnerability classification. Based on these
findings, we believe our study will have a positive impact on the software
engineering community, inspiring the development of improved methods for
training and utilizing LLMs, thereby leading to safer and more trustworthy
model deployment
Exploring Security Commits in Python
Python has become the most popular programming language as it is friendly to
work with for beginners. However, a recent study has found that most security
issues in Python have not been indexed by CVE and may only be fixed by 'silent'
security commits, which pose a threat to software security and hinder the
security fixes to downstream software. It is critical to identify the hidden
security commits; however, the existing datasets and methods are insufficient
for security commit detection in Python, due to the limited data variety,
non-comprehensive code semantics, and uninterpretable learned features. In this
paper, we construct the first security commit dataset in Python, namely
PySecDB, which consists of three subsets including a base dataset, a pilot
dataset, and an augmented dataset. The base dataset contains the security
commits associated with CVE records provided by MITRE. To increase the variety
of security commits, we build the pilot dataset from GitHub by filtering
keywords within the commit messages. Since not all commits provide commit
messages, we further construct the augmented dataset by understanding the
semantics of code changes. To build the augmented dataset, we propose a new
graph representation named CommitCPG and a multi-attributed graph learning
model named SCOPY to identify the security commit candidates through both
sequential and structural code semantics. The evaluation shows our proposed
algorithms can improve the data collection efficiency by up to 40 percentage
points. After manual verification by three security experts, PySecDB consists
of 1,258 security commits and 2,791 non-security commits. Furthermore, we
conduct an extensive case study on PySecDB and discover four common security
fix patterns that cover over 85% of security commits in Python, providing
insight into secure software maintenance, vulnerability detection, and
automated program repair.Comment: Accepted to 2023 IEEE International Conference on Software
Maintenance and Evolution (ICSME
PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection
Bug datasets are vital for enabling deep learning techniques to address
software maintenance tasks related to bugs. However, existing bug datasets
suffer from precise and scale limitations: they are either small-scale but
precise with manual validation or large-scale but imprecise with simple commit
message processing. In this paper, we introduce PreciseBugCollector, a precise,
multi-language bug collection approach that overcomes these two limitations.
PreciseBugCollector is based on two novel components: a) A bug tracker to map
the codebase repositories with external bug repositories to trace bug type
information, and b) A bug injector to generate project-specific bugs by
injecting noise into the correct codebases and then executing them against
their test suites to obtain test failure messages.
We implement PreciseBugCollector against three sources: 1) A bug tracker that
links to the national vulnerability data set (NVD) to collect general-wise
vulnerabilities, 2) A bug tracker that links to OSS-Fuzz to collect
general-wise bugs, and 3) A bug injector based on 16 injection rules to
generate project-wise bugs. To date, PreciseBugCollector comprises 1057818 bugs
extracted from 2968 open-source projects. Of these, 12602 bugs are sourced from
bug repositories (NVD and OSS-Fuzz), while the remaining 1045216
project-specific bugs are generated by the bug injector. Considering the
challenge objectives, we argue that a bug injection approach is highly valuable
for the industrial setting, since project-specific bugs align with domain
knowledge, share the same codebase, and adhere to the coding style employed in
industrial projects.Comment: Accepted at the industry challenge track of ASE 202
Multi-Granularity Detector for Vulnerability Fixes
With the increasing reliance on Open Source Software, users are exposed to
third-party library vulnerabilities. Software Composition Analysis (SCA) tools
have been created to alert users of such vulnerabilities. SCA requires the
identification of vulnerability-fixing commits. Prior works have proposed
methods that can automatically identify such vulnerability-fixing commits.
However, identifying such commits is highly challenging, as only a very small
minority of commits are vulnerability fixing. Moreover, code changes can be
noisy and difficult to analyze. We observe that noise can occur at different
levels of detail, making it challenging to detect vulnerability fixes
accurately.
To address these challenges and boost the effectiveness of prior works, we
propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from
prior works, Midas constructs different neural networks for each level of code
change granularity, corresponding to commit-level, file-level, hunk-level, and
line-level, following their natural organization. It then utilizes an ensemble
model that combines all base models to generate the final prediction. This
design allows MiDas to better handle the noisy and highly imbalanced nature of
vulnerability-fixing commit data. Additionally, to reduce the human effort
required to inspect code changes, we have designed an effort-aware adjustment
for Midas's outputs based on commit length. The evaluation results demonstrate
that MiDas outperforms the current state-of-the-art baseline in terms of AUC by
4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in
terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also
outperforms the state-of-the-art baseline, achieving improvements of up to
28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively
How Effective Are Neural Networks for Fixing Security Vulnerabilities
Security vulnerability repair is a difficult task that is in dire need of
automation. Two groups of techniques have shown promise: (1) large code
language models (LLMs) that have been pre-trained on source code for tasks such
as code completion, and (2) automated program repair (APR) techniques that use
deep learning (DL) models to automatically fix software bugs.
This paper is the first to study and compare Java vulnerability repair
capabilities of LLMs and DL-based APR models. The contributions include that we
(1) apply and evaluate five LLMs (Codex, CodeGen, CodeT5, PLBART and InCoder),
four fine-tuned LLMs, and four DL-based APR techniques on two real-world Java
vulnerability benchmarks (Vul4J and VJBench), (2) design code transformations
to address the training and test data overlapping threat to Codex, (3) create a
new Java vulnerability repair benchmark VJBench, and its transformed version
VJBench-trans and (4) evaluate LLMs and APR techniques on the transformed
vulnerabilities in VJBench-trans.
Our findings include that (1) existing LLMs and APR models fix very few Java
vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities.
(2) Fine-tuning with general APR data improves LLMs' vulnerability-fixing
capabilities. (3) Our new VJBench reveals that LLMs and APR models fail to fix
many Common Weakness Enumeration (CWE) types, such as CWE-325 Missing
cryptographic step and CWE-444 HTTP request smuggling. (4) Codex still fixes
8.3 transformed vulnerabilities, outperforming all the other LLMs and APR
models on transformed vulnerabilities. The results call for innovations to
enhance automated Java vulnerability repair such as creating larger
vulnerability repair training data, tuning LLMs with such data, and applying
code simplification transformation to facilitate vulnerability repair.Comment: This paper has been accepted to appear in the proceedings of the 32nd
ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA
2023), and to be presented at the conference, that will be held in Seattle,
USA, 17-21 July 202
SecBench.js: An Executable Security Benchmark Suite for Server-Side JavaScript
Npm is the largest software ecosystem in the world, offering millions of free, reusable packages. In recent years, various security threats to packages published on npm have been reported, including vulnerabilities that affect millions of users. To continuously improve techniques for detecting vulnerabilities and mitigating attacks that exploit them, a reusable benchmark of vulnerabilities would be highly desirable. Ideally, such a benchmark should be realistic, come with executable exploits, and include fixes of vulnerabilities. Unfortunately, there currently is no such benchmark, forcing researchers to repeatedly develop their own evaluation datasets and making it difficult to compare techniques with each other. This paper presents SecBench.js, the first comprehensive benchmark suite of vulnerabilities and executable exploits for npm. The benchmark comprises 600 vulnerabilities, which cover the five most common vulnerability classes for server-side JavaScript. Each vulnerability comes with a payload that exploits the vulnerability and an oracle that validates successful exploitation. SecBench.js enables various applications, of which we explore three in this paper: (i) crosschecking SecBench.js against existing security advisories reveals 168 vulnerable versions in 19 packages that are mislabeled in the advisories; (ii) applying simple code transformations to the exploits in our suite helps identify flawed fixes of vulnerabilities; (iii) dynamically analyzing calls to common sink APIs, e.g., exec(), yields a ground truth of code locations for evaluating vulnerability detectors. Beyond providing a reusable benchmark to the community, our work identified 20 zero-day vulnerabilities, most of which are already acknowledged by practitioners
- …