3 research outputs found

    Tracing Vulnerabilities Across Product Releases

    Get PDF
    When a software development team becomes aware of a vulnerability, it generally only knows that the last version of that software product is vulnerable. However, today most software products have more than one version being actively used at a time. Garnering information on which versions contain a vulnerability, and which do not, is crucial for the users, to know which versions of a software product are safe to use, and also for the developers, to know where to apply the patch. The patch, i.e. the fix of the vulnerability, contains valuable information in the form of changes made to the known vulnerable code to fix it. This information could be leveraged to analyze the presence of this known vulnerability across releases of a software product. The problem of tracing vulnerabilities in different releases has been addressed in two separate research projects. Both of these projects rely on the changed lines of code to fix a vulnerability, and conclude whether a version is vulnerable or not based on the presence of these lines of code. However, relying simply on lines of code fails to consider the changes in the source code context where the patch has been introduced from a version to a version. In addressing this problem, this research project will focus on representing the patch and the versions to be evaluated in a more flexible format such as an Abstract Syntax Tree (AST). This approach is more robust compared to the line-based approach, because ASTs abstract away these changes in the context and allow us to focus more efficiently on the structure and behavior of the code in the patch. As such, instead of using lines of code, the unit of comparison in our approach will be nodes in an AST. Moreover, our approach will generate comprehensive artifacts that could guide developers to more efficiently patch the different versions of their product. We implemented our approach in a Java tool named Patchilyzer and we tested it in 174 Tomcat versions for a total of 39 vulnerabilities

    Towards an Improved Understanding of Software Vulnerability Assessment Using Data-Driven Approaches

    Get PDF
    Software Vulnerabilities (SVs) can expose software systems to cyber-attacks, potentially causing enormous financial and reputational damage for organizations. There have been significant research efforts to detect these SVs so that developers can promptly fix them. However, fixing SVs is complex and time-consuming in practice, and thus developers usually do not have sufficient time and resources to fix all SVs at once. As a result, developers often need SV information, such as exploitability, impact, and overall severity, to prioritize fixing more critical SVs. Such information required for fixing planning and prioritization is typically provided in the SV assessment step of the SV lifecycle. Recently, data-driven methods have been increasingly proposed to automate SV assessment tasks. However, there are still numerous shortcomings with the existing studies on data-driven SV assessment that would hinder their application in practice. This PhD thesis aims to contribute to the growing literature in data-driven SV assessment by investigating and addressing the constant changes in SV data as well as the lacking considerations of source code and developers’ needs for SV assessment that impede the practical applicability of the field. Particularly, we have made the following five contributions in this thesis. (1) We systematize the knowledge of data-driven SV assessment to reveal the best practices of the field and the main challenges affecting its application in practice. Subsequently, we propose various solutions to tackle these challenges to better support the real-world applications of data-driven SV assessment. (2) We first demonstrate the existence of the concept drift (changing data) issue in descriptions of SV reports that current studies have mostly used for predicting the Common Vulnerability Scoring System (CVSS) metrics. We augment report-level SV assessment models with subwords of terms extracted from SV descriptions to help the models more effectively capture the semantics of ever-increasing SVs. (3) We also identify that SV reports are usually released after SV fixing. Thus, we propose using vulnerable code to enable earlier SV assessment without waiting for SV reports. We are the first to use Machine Learning techniques to predict CVSS metrics on the function level leveraging vulnerable statements directly causing SVs and their context in code functions. The performance of our function-level SV assessment models is promising, opening up research opportunities in this new direction. (4) To facilitate continuous integration of software code nowadays, we present a novel deep multi-task learning model, DeepCVA, to simultaneously and efficiently predict multiple CVSS assessment metrics on the commit level, specifically using vulnerability-contributing commits. DeepCVA is the first work that enables practitioners to perform SV assessment as soon as vulnerable changes are added to a codebase, supporting just-in-time prioritization of SV fixing. (5) Besides code artifacts produced from a software project of interest, SV assessment tasks can also benefit from SV crowdsourcing information on developer Question and Answer (Q&A) websites. We automatically retrieve large-scale security/SVrelated posts from these Q&A websites. We then apply a topic modeling technique on these posts to distill developers’ real-world SV concerns that can be used for data-driven SV assessment. Overall, we believe that this thesis has provided evidence-based knowledge and useful guidelines for researchers and practitioners to automate SV assessment using data-driven approaches.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
    corecore