301 research outputs found

    Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories

    Full text link
    The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, an advisory record containing key information about a vulnerability is extracted from an advisory (expressed in natural language). Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project by filtering out commits that are known to be irrelevant for the task at hand. Finally, for each such candidate commit, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. The feature vectors are then exploited for building a final ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to interpret of the predictions. We evaluated our approach using a prototype implementation named Prospector on a manually curated data set that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). In conclusion, our method reduces considerably the effort needed to search OSS repositories for the commits that fix known vulnerabilities

    Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

    Full text link
    Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. To effectively mitigate this concern, this paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as supplemental material and will be made publicly available after publication.}, a meticulously curated dataset targeting 21 critical vulnerability types. SecuCoGen comprises 180 samples and serves as the foundation for conducting experiments on three crucial code-related tasks: code generation, code repair and vulnerability classification, with a strong emphasis on security. Our experimental results reveal that existing models often overlook security concerns during code generation, leading to the generation of vulnerable code. To address this, we propose effective approaches to mitigate the security vulnerabilities and enhance the overall robustness of code generated by LLMs. Moreover, our study identifies weaknesses in existing models' ability to repair vulnerable code, even when provided with vulnerability information. Additionally, certain vulnerability types pose challenges for the models, hindering their performance in vulnerability classification. Based on these findings, we believe our study will have a positive impact on the software engineering community, inspiring the development of improved methods for training and utilizing LLMs, thereby leading to safer and more trustworthy model deployment

    Exploring Security Commits in Python

    Full text link
    Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by 'silent' security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commits; however, the existing datasets and methods are insufficient for security commit detection in Python, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features. In this paper, we construct the first security commit dataset in Python, namely PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset. The base dataset contains the security commits associated with CVE records provided by MITRE. To increase the variety of security commits, we build the pilot dataset from GitHub by filtering keywords within the commit messages. Since not all commits provide commit messages, we further construct the augmented dataset by understanding the semantics of code changes. To build the augmented dataset, we propose a new graph representation named CommitCPG and a multi-attributed graph learning model named SCOPY to identify the security commit candidates through both sequential and structural code semantics. The evaluation shows our proposed algorithms can improve the data collection efficiency by up to 40 percentage points. After manual verification by three security experts, PySecDB consists of 1,258 security commits and 2,791 non-security commits. Furthermore, we conduct an extensive case study on PySecDB and discover four common security fix patterns that cover over 85% of security commits in Python, providing insight into secure software maintenance, vulnerability detection, and automated program repair.Comment: Accepted to 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME

    PreciseBugCollector: Extensible, Executable and Precise Bug-fix Collection

    Full text link
    Bug datasets are vital for enabling deep learning techniques to address software maintenance tasks related to bugs. However, existing bug datasets suffer from precise and scale limitations: they are either small-scale but precise with manual validation or large-scale but imprecise with simple commit message processing. In this paper, we introduce PreciseBugCollector, a precise, multi-language bug collection approach that overcomes these two limitations. PreciseBugCollector is based on two novel components: a) A bug tracker to map the codebase repositories with external bug repositories to trace bug type information, and b) A bug injector to generate project-specific bugs by injecting noise into the correct codebases and then executing them against their test suites to obtain test failure messages. We implement PreciseBugCollector against three sources: 1) A bug tracker that links to the national vulnerability data set (NVD) to collect general-wise vulnerabilities, 2) A bug tracker that links to OSS-Fuzz to collect general-wise bugs, and 3) A bug injector based on 16 injection rules to generate project-wise bugs. To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects. Of these, 12602 bugs are sourced from bug repositories (NVD and OSS-Fuzz), while the remaining 1045216 project-specific bugs are generated by the bug injector. Considering the challenge objectives, we argue that a bug injection approach is highly valuable for the industrial setting, since project-specific bugs align with domain knowledge, share the same codebase, and adhere to the coding style employed in industrial projects.Comment: Accepted at the industry challenge track of ASE 202

    Multi-Granularity Detector for Vulnerability Fixes

    Full text link
    With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, Midas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization. It then utilizes an ensemble model that combines all base models to generate the final prediction. This design allows MiDas to better handle the noisy and highly imbalanced nature of vulnerability-fixing commit data. Additionally, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for Midas's outputs based on commit length. The evaluation results demonstrate that MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also outperforms the state-of-the-art baseline, achieving improvements of up to 28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively

    How Effective Are Neural Networks for Fixing Security Vulnerabilities

    Full text link
    Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and (2) automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs. This paper is the first to study and compare Java vulnerability repair capabilities of LLMs and DL-based APR models. The contributions include that we (1) apply and evaluate five LLMs (Codex, CodeGen, CodeT5, PLBART and InCoder), four fine-tuned LLMs, and four DL-based APR techniques on two real-world Java vulnerability benchmarks (Vul4J and VJBench), (2) design code transformations to address the training and test data overlapping threat to Codex, (3) create a new Java vulnerability repair benchmark VJBench, and its transformed version VJBench-trans and (4) evaluate LLMs and APR techniques on the transformed vulnerabilities in VJBench-trans. Our findings include that (1) existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities. (2) Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities. (3) Our new VJBench reveals that LLMs and APR models fail to fix many Common Weakness Enumeration (CWE) types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling. (4) Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities. The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.Comment: This paper has been accepted to appear in the proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023), and to be presented at the conference, that will be held in Seattle, USA, 17-21 July 202

    SecBench.js: An Executable Security Benchmark Suite for Server-Side JavaScript

    Get PDF
    Npm is the largest software ecosystem in the world, offering millions of free, reusable packages. In recent years, various security threats to packages published on npm have been reported, including vulnerabilities that affect millions of users. To continuously improve techniques for detecting vulnerabilities and mitigating attacks that exploit them, a reusable benchmark of vulnerabilities would be highly desirable. Ideally, such a benchmark should be realistic, come with executable exploits, and include fixes of vulnerabilities. Unfortunately, there currently is no such benchmark, forcing researchers to repeatedly develop their own evaluation datasets and making it difficult to compare techniques with each other. This paper presents SecBench.js, the first comprehensive benchmark suite of vulnerabilities and executable exploits for npm. The benchmark comprises 600 vulnerabilities, which cover the five most common vulnerability classes for server-side JavaScript. Each vulnerability comes with a payload that exploits the vulnerability and an oracle that validates successful exploitation. SecBench.js enables various applications, of which we explore three in this paper: (i) crosschecking SecBench.js against existing security advisories reveals 168 vulnerable versions in 19 packages that are mislabeled in the advisories; (ii) applying simple code transformations to the exploits in our suite helps identify flawed fixes of vulnerabilities; (iii) dynamically analyzing calls to common sink APIs, e.g., exec(), yields a ground truth of code locations for evaluating vulnerability detectors. Beyond providing a reusable benchmark to the community, our work identified 20 zero-day vulnerabilities, most of which are already acknowledged by practitioners
    • …
    corecore