89 research outputs found

    Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

    Full text link
    In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher-severity ones get immediate attention. In this study, we investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels of two prominent datasets. We leverage several source metrics at method-level granularity to train eight different machine-learning models. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. We then use the pre-trained CodeBERT LLM to study the source code representations' effectiveness in predicting bug severity. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics, compared to the best classic prediction model on source code metric. Finally, we integrate source code metrics into CodeBERT as an additional input, using our two proposed architectures, which both enhance the CodeBERT model effectiveness

    A New Approach for Predicting Security Vulnerability Severity in Attack Prone Software Using Architecture and Repository Mined Change Metrics

    Get PDF
    Billions of dollars are lost every year to successful cyber attacks that are fundamentally enabled by software vulnerabilities. Modern cyber attacks increasingly threaten individuals, organizations, and governments, causing service disruption, inconvenience, and costly incident response. Given that such attacks are primarily enabled by software vulnerabilities, this work examines the efficacy of using change metrics, along with architectural burst and maintainability metrics, to predict modules and files that might be analyzed or tested further to excise vulnerabilities prior to release. The problem addressed by this research is the residual vulnerability problem, or vulnerabilities that evade detection and persist in released software. Many modern software projects are over a million lines of code, and composed of reused components of varying maturity. The sheer size of modern software, along with the reuse of existing open source modules, complicates the questions of where to look, and in what order to look, for residual vulnerabilities. Traditional code complexity metrics, along with newer frequency based churn metrics (mined from software repository change history), are selected specifically for their relevance to the residual vulnerability problem. We compare the performance of these complexity and churn metrics to architectural level change burst metrics, automatically mined from the git repositories of the Mozilla Firefox Web Browser, Apache HTTP Web Server, and the MySQL Database Server, for the purpose of predicting attack prone files and modules. We offer new empirical data quantifying the relationship between our selected metrics and the severity of vulnerable files and modules, assessed using severity data compiled from the NIST National Vulnerability Database, and cross-referenced to our study subjects using unique identifiers defined by the Common Vulnerabilities and Exposures (CVE) vulnerability catalog. Specifically, we evaluate our metrics against the severity scores from CVE entries associated with known-vulnerable files and modules. We use the severity scores according to the Base Score Metric from the Common Vulnerability Scoring System (CVSS), corresponding to applicable CVE entries extracted from the NIST National Vulnerability Database, which we associate with vulnerable files and modules via automated and semi-automated techniques. Our results show that architectural level change burst metrics can perform well in situations where more traditional complexity metrics fail as reliable estimators of vulnerability severity. In particular, results from our experiments on Apache HTTP Web Server indicate that architectural level change burst metrics show high correlation with the severity of known vulnerable modules, and do so with information directly available from the version control repository change-set (i.e., commit) history

    The Software Vulnerability Ecosystem: Software Development In The Context Of Adversarial Behavior

    Get PDF
    Software vulnerabilities are the root cause of many computer system security fail- ures. This dissertation addresses software vulnerabilities in the context of a software lifecycle, with a particular focus on three stages: (1) improving software quality dur- ing development; (2) pre- release bug discovery and repair; and (3) revising software as vulnerabilities are found. The question I pose regarding software quality during development is whether long-standing software engineering principles and practices such as code reuse help or hurt with respect to vulnerabilities. Using a novel data-driven analysis of large databases of vulnerabilities, I show the surprising result that software quality and software security are distinct. Most notably, the analysis uncovered a counterintu- itive phenomenon, namely that newly introduced software enjoys a period with no vulnerability discoveries, and further that this “Honeymoon Effect” (a term I coined) is well-explained by the unfamiliarity of the code to malicious actors. An important consequence for code reuse, intended to raise software quality, is that protections inherent in delays in vulnerability discovery from new code are reduced. The second question I pose is the predictive power of this effect. My experimental design exploited a large-scale open source software system, Mozilla Firefox, in which two development methodologies are pursued in parallel, making that the sole variable in outcomes. Comparing the methodologies using a novel synthesis of data from vulnerability databases, These results suggest that the rapid-release cycles used in agile software development (in which new software is introduced frequently) have a vulnerability discovery rate equivalent to conventional development. Finally, I pose the question of the relationship between the intrinsic security of software, stemming from design and development, and the ecosystem into which the software is embedded and in which it operates. I use the early development lifecycle to examine this question, and again use vulnerability data as the means of answering it. Defect discovery rates should decrease in a purely intrinsic model, with software maturity making vulnerabilities increasingly rare. The data, which show that vulnerability rates increase after a delay, contradict this. Software security therefore must be modeled including extrinsic factors, thus comprising an ecosystem

    Toward Data-Driven Discovery of Software Vulnerabilities

    Get PDF
    Over the years, Software Engineering, as a discipline, has recognized the potential for engineers to make mistakes and has incorporated processes to prevent such mistakes from becoming exploitable vulnerabilities. These processes span the spectrum from using unit/integration/fuzz testing, static/dynamic/hybrid analysis, and (automatic) patching to discover instances of vulnerabilities to leveraging data mining and machine learning to collect metrics that characterize attributes indicative of vulnerabilities. Among these processes, metrics have the potential to uncover systemic problems in the product, process, or people that could lead to vulnerabilities being introduced, rather than identifying specific instances of vulnerabilities. The insights from metrics can be used to support developers and managers in making decisions to improve the product, process, and/or people with the goal of engineering secure software. Despite empirical evidence of metrics\u27 association with historical software vulnerabilities, their adoption in the software development industry has been limited. The level of granularity at which the metrics are defined, the high false positive rate from models that use the metrics as explanatory variables, and, more importantly, the difficulty in deriving actionable intelligence from the metrics are often cited as factors that inhibit metrics\u27 adoption in practice. Our research vision is to assist software engineers in building secure software by providing a technique that generates scientific, interpretable, and actionable feedback on security as the software evolves. In this dissertation, we present our approach toward achieving this vision through (1) systematization of vulnerability discovery metrics literature, (2) unsupervised generation of metrics-informed security feedback, and (3) continuous developer-in-the-loop improvement of the feedback. We systematically reviewed the literature to enumerate metrics that have been proposed and/or evaluated to be indicative of vulnerabilities in software and to identify the validation criteria used to assess the decision-informing ability of these metrics. In addition to enumerating the metrics, we implemented a subset of these metrics as containerized microservices. We collected the metric values from six large open-source projects and assessed metrics\u27 generalizability across projects, application domains, and programming languages. We then used an unsupervised approach from literature to compute threshold values for each metric and assessed the thresholds\u27 ability to classify risk from historical vulnerabilities. We used the metrics\u27 values, thresholds, and interpretation to provide developers natural language feedback on security as they contributed changes and used a survey to assess their perception of the feedback. We initiated an open dialogue to gain an insight into their expectations from such feedback. In response to developer comments, we assessed the effectiveness of an existing vulnerability discovery approach—static analysis—and that of vulnerability discovery metrics in identifying risk from vulnerability contributing commits

    Understanding the Impact of Diversity in Software Bugs on Bug Prediction Models

    Get PDF
    Nowadays, software systems are essential for businesses, users and society. At the same time such systems are growing both in complexity and size. In this context, developing high-quality software is a challenging and expensive activity for the software industry. Since software organizations are always limited by their budget, personnel and time, it is not a trivial task to allocate testing and code-review resources to areas that require the most attention. To overcome the above problem, researchers have developed software bug prediction models that can help practitioners to predict the most bug-prone software entities. Although, software bug prediction is a very popular research area, yet its industrial adoption remains limited. In this thesis, we investigate three possible issues with the current state-of-the-art in software bug prediction that affect the practical usability of prediction models. First, we argue that current bug prediction models implicitly assume that all bugs are the same without taking into consideration their impact. We study the impact of bugs in terms of experience of the developers required to fix them. Second, only few studies investigate the impact of specific type of bugs. Therefore, we characterize a severe type of bug called Blocking bugs, and provide approaches to predict them early on. Third, false-negative files are buggy files that bug prediction models incorrectly as non-buggy files. We argue that a large number of false-negative files makes bug prediction models less attractive for developers. In our thesis, we quantify the extent of false-negative files, and manually inspect them in order to better understand their nature

    Analyze the Performance of Software by Machine Learning Methods for Fault Prediction Techniques

    Get PDF
    Trend of using the software in daily life is increasing day by day. Software system development is growing more difficult as these technologies are integrated into daily life. Therefore, creating highly effective software is a significant difficulty. The quality of any software system continues to be the most important element among all the required characteristics. Nearly one-third of the total cost of software development goes toward testing. Therefore, it is always advantageous to find a software bug early in the software development process because if it is not found early, it will drive up the cost of the software development. This type of issue is intended to be resolved via software fault prediction. There is always a need for a better and enhanced prediction model in order to forecast the fault before the real testing and so reduce the flaws in the time and expense of software projects. The various machine learning techniques for classifying software bugs are discussed in this paper

    Evaluating Vulnerability Prediction Models

    Get PDF
    Today almost every device depends on a piece of software. As a result, our life increasingly depends on some software form such as smartphone apps, laundry machines, web applications, computers, transportation and many others, all of which rely on software. Inevitably, this dependence raises the issue of software vulnerabilities and their possible impact on our lifestyle. Over the years, researchers and industrialists suggested several approaches to detect such issues and vulnerabilities. A particular popular branch of such approaches, usually called Vulnerability Prediction Modelling (VPM) techniques, leverage prediction modelling techniques that flag suspicious (likely vulnerable) code components. These techniques rely on source code features as indicators of vulnerabilities to build the prediction models. However, the emerging question is how effective such methods are and how they can be used in practice. The present dissertation studies vulnerability prediction models and evaluates them on real and reliable playground. To this end, it suggests a toolset that automatically collects real vulnerable code instances, from major open source systems, suitable for applying VPM. These code instances are then used to analyze, replicate, compare and develop new VPMs. Specifically, the dissertation has 3 main axes: The first regards the analysis of vulnerabilities. Indeed, to build VPMs accurately, numerous data are required. However, by their nature, vulnerabilities are scarce and the information about them is spread over different sources (NVD, Git, Bug Trackers). Thus, the suggested toolset (develops an automatic way to build a large dataset) enables the reliable and relevant analysis of VPMs. The second axis focuses on the empirical comparison and analysis of existing Vulnerability Prediction Models. It thus develops and replicates existing VPMs. To this end, the thesis introduces a framework that builds, analyse and compares existing prediction models (using the already proposed sets of features) using the dataset developed on the first axis. The third axis explores the use of cross-entropy (metric used by natural language processing) as a potential feature for developing new VPMs. Cross-entropy, usually referred to as the naturalness of code, is a recent approach that measures the repetitiveness of code (relying on statistical models). Using cross-entropy, the thesis investigates different ways of building and using VPMs. Overall, this thesis provides a fully-fledge study on Vulnerability Prediction Models aiming at assessing and improving their performance

    Understanding the Impact of Release Processes and Practices on Software Quality

    Get PDF
    L’ingénierie de production (release engineering) englobe toutes les activités visant à «construire un pipeline qui transforme le code source en un produit intégré, compilé, empaqueté, testé et signé prêt à être publier». La stratégie des production et les pratiques de publication peuvent avoir un impact sur la qualité d’un produit logiciel. Bien que cet impact ait été longuement discuté et étudié dans la communauté du génie logiciel, il reste encore plusieurs problèmes à résoudre. Cette thèse s’attaque à quelque-uns de ces problèmes non résoulus de l’ingénierie de production en vue de proposer des solutions. En particulier, nous investigons : 1) pourquoi les activités de révision de code (code review) peuvent rater des erreurs de code susceptibles de causer des plantages (crashs); (2) comment prévenir les bogues lors de l’approbation et l’intégration des patches urgents; 3) dans un écosystème logiciel, comment atténuer le risque de bogues dus à des injections de DLL. Nous avons choisi d’étudier ces problèmes car ils correspondent à trois phases importantes des processus de production de logiciels, c’est-à-dire la révision de code, les patches urgents, et la publication de logiciels dans un écosystème. Les solutions à ces problèmes peuvent aider les entreprises de logiciels à améliorer leur stratégie de production et de publication. Ce qui augmentera leur productivité de développement et la qualité générale de leurs produits logiciels.----------ABSTRACT: Release engineering encompasses all the activities aimed at “building a pipeline that transforms source code into an integrated, compiled, packaged, tested, and signed product that is ready for release”. The strategy of the release processes and practices can impact the quality of a software artefact. Although such impact has been extensively discussed and studied in the software engineering community, there are still many pending issues to resolve. The goal of this thesis is to study and solve some of these pending issues. More specifically, we examine 1) why code review practices can miss crash-prone code; 2) how urgent patches (also called patch uplift) are approved to release and how to prevent regressions due to urgent patches; 3) in a software ecosystem, how to mitigate the risk of defects due to DLL injections. We chose to study these problems because they correspond to three important phases of software release processes, i.e., code review, patch uplift, and releasing software in an ecosystem. The solutions of these problems can help software organizations improve their release strategy; increasing their development productivity and the overall user-perceived quality of their products

    Fine-grained code changes and bugs: Improving bug prediction

    Full text link
    Software development and, in particular, software maintenance are time consuming and require detailed knowledge of the structure and the past development activities of a software system. Limited resources and time constraints make the situation even more difficult. Therefore, a significant amount of research effort has been dedicated to learning software prediction models that allow project members to allocate and spend the limited resources efficiently on the (most) critical parts of their software system. Prominent examples are bug prediction models and change prediction models: Bug prediction models identify the bug-prone modules of a software system that should be tested with care; change prediction models identify modules that change frequently and in combination with other modules, i.e., they are change coupled. By combining statistical methods, data mining approaches, and machine learning techniques software prediction models provide a structured and analytical basis to make decisions.Researchers proposed a wide range of approaches to build effective prediction models that take into account multiple aspects of the software development process. They achieved especially good prediction performance, guiding developers towards those parts of their system where a large share of bugs can be expected. For that, they rely on change data provided by version control systems (VCS). However, due to the fact that current VCS track code changes only on file-level and textual basis most of those approaches suffer from coarse-grained and rather generic change information. More fine-grained change information, for instance, at the level of source code statements, and the type of changes, e.g., whether a method was renamed or a condition expression was changed, are often not taken into account. Therefore, investigating the development process and the evolution of software at a fine-grained change level has recently experienced an increasing attention in research.The key contribution of this thesis is to improve software prediction models by using fine-grained source code changes. Those changes are based on the abstract syntax tree structure of source code and allow us to track code changes at the fine-grained level of individual statements. We show with a series of empirical studies using the change history of open-source projects how prediction models can benefit in terms of prediction performance and prediction granularity from the more detailed change information.First, we compare fine-grained source code changes and code churn, i.e., lines modified, for bug prediction. The results with data from the Eclipse platform show that fine grained-source code changes significantly outperform code churn when classifying source files into bug- and not bug-prone, as well as when predicting the number of bugs in source files. Moreover, these results give more insights about the relation of individual types of code changes, e.g., method declaration changes and bugs. For instance, in our dataset method declaration changes exhibit a stronger correlation with the number of bugs than class declaration changes.Second, we leverage fine-grained source code changes to predict bugs at method-level. This is beneficial as files can grow arbitrarily large. Hence, if bugs are predicted at the level of files a developer needs to manually inspect all methods of a file one by one until a particular bug is located.Third, we build models using source code properties, e.g., complexity, to predict whether a source file will be affected by a certain type of code change. Predicting the type of changes is of practical interest, for instance, in the context of software testing as different change types require different levels of testing: While for small statement changes local unit-tests are mostly sufficient, API changes, e.g., method declaration changes, might require system-wide integration-tests which are more expensive. Hence, knowing (in advance) which types of changes will most likely occur in a source file can help to better plan and develop tests, and, in case of limited resources, prioritize among different types of testing.Finally, to assist developers in bug triaging we compute prediction models based on the attributes of a bug report that can be used to estimate whether a bug will be fixed fast or whether it will take more time for resolution.The results and findings of this thesis give evidence that fine-grained source code changes can improve software prediction models to provide more accurate results
    • …
    corecore