205 research outputs found

    CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model

    Full text link
    This paper addresses a critical challenge in cybersecurity: the gap between vulnerability information represented by Common Vulnerabilities and Exposures (CVEs) and the resulting cyberattack actions. CVEs provide insights into vulnerabilities, but often lack details on potential threat actions (tactics, techniques, and procedures, or TTPs) within the ATT&CK framework. This gap hinders accurate CVE categorization and proactive countermeasure initiation. The paper introduces the TTPpredictor tool, which uses innovative techniques to analyze CVE descriptions and infer plausible TTP attacks resulting from CVE exploitation. TTPpredictor overcomes challenges posed by limited labeled data and semantic disparities between CVE and TTP descriptions. It initially extracts threat actions from unstructured cyber threat reports using Semantic Role Labeling (SRL) techniques. These actions, along with their contextual attributes, are correlated with MITRE's attack functionality classes. This automated correlation facilitates the creation of labeled data, essential for categorizing novel threat actions into threat functionality classes and TTPs. The paper presents an empirical assessment, demonstrating TTPpredictor's effectiveness with accuracy rates of approximately 98% and F1-scores ranging from 95% to 98% in precise CVE classification to ATT&CK techniques. TTPpredictor outperforms state-of-the-art language model tools like ChatGPT. Overall, this paper offers a robust solution for linking CVEs to potential attack techniques, enhancing cybersecurity practitioners' ability to proactively identify and mitigate threats

    The anatomy of a vulnerability database: A systematic mapping study

    Get PDF
    Software vulnerabilities play a major role, as there are multiple risks associated, including loss and manipulation of private data. The software engineering research community has been contributing to the body of knowledge by proposing several empirical studies on vulnerabilities and automated techniques to detect and remove them from source code. The reliability and generalizability of the findings heavily depend on the quality of the information mineable from publicly available datasets of vulnerabilities as well as on the availability and suitability of those databases. In this paper, we seek to understand the anatomy of the currently available vulnerability databases through a systematic mapping study where we analyze (1) what are the popular vulnerability databases adopted; (2) what are the goals for adoption; (3) what are the other sources of information adopted; (4) what are the methods and techniques; (5) which tools are proposed. An improved understanding of these aspects might not only allow researchers to take informed decisions on the databases to consider when doing research but also practitioners to establish reliable sources of information to inform their security policies and standards.publishedVersionPeer reviewe

    A Case Study on Software Vulnerability Coordination

    Get PDF
    Context: Coordination is a fundamental tenet of software engineering. Coordination is required also for identifying discovered and disclosed software vulnerabilities with Common Vulnerabilities and Exposures (CVEs). Motivated by recent practical challenges, this paper examines the coordination of CVEs for open source projects through a public mailing list. Objective: The paper observes the historical time delays between the assignment of CVEs on a mailing list and the later appearance of these in the National Vulnerability Database (NVD). Drawing from research on software engineering coordination, software vulnerabilities, and bug tracking, the delays are modeled through three dimensions: social networks and communication practices, tracking infrastructures, and the technical characteristics of the CVEs coordinated. Method: Given a period between 2008 and 2016, a sample of over five thousand CVEs is used to model the delays with nearly fifty explanatory metrics. Regression analysis is used for the modeling. Results: The results show that the CVE coordination delays are affected by different abstractions for noise and prerequisite constraints. These abstractions convey effects from the social network and infrastructure dimensions. Particularly strong effect sizes are observed for annual and monthly control metrics, a control metric for weekends, the degrees of the nodes in the CVE coordination networks, and the number of references given in NVD for the CVEs archived. Smaller but visible effects are present for metrics measuring the entropy of the emails exchanged, traces to bug tracking systems, and other related aspects. The empirical signals are weaker for the technical characteristics. Conclusion: [...

    Large Language Models for Software Engineering: A Systematic Literature Review

    Full text link
    Large Language Models (LLMs) have significantly impacted numerous domains, notably including Software Engineering (SE). Nevertheless, a well-rounded understanding of the application, effects, and possible limitations of LLMs within SE is still in its early stages. To bridge this gap, our systematic literature review takes a deep dive into the intersection of LLMs and SE, with a particular focus on understanding how LLMs can be exploited in SE to optimize processes and outcomes. Through a comprehensive review approach, we collect and analyze a total of 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize and provide a comparative analysis of different LLMs that have been employed in SE tasks, laying out their distinctive features and uses. For RQ2, we detail the methods involved in data collection, preprocessing, and application in this realm, shedding light on the critical role of robust, well-curated datasets for successful LLM implementation. RQ3 allows us to examine the specific SE tasks where LLMs have shown remarkable success, illuminating their practical contributions to the field. Finally, RQ4 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE, as well as the common techniques related to prompt optimization. Armed with insights drawn from addressing the aforementioned RQs, we sketch a picture of the current state-of-the-art, pinpointing trends, identifying gaps in existing research, and flagging promising areas for future study

    INTRUSION PREDICTION SYSTEM FOR CLOUD COMPUTING AND NETWORK BASED SYSTEMS

    Get PDF
    Cloud computing offers cost effective computational and storage services with on-demand scalable capacities according to the customers’ needs. These properties encourage organisations and individuals to migrate from classical computing to cloud computing from different disciplines. Although cloud computing is a trendy technology that opens the horizons for many businesses, it is a new paradigm that exploits already existing computing technologies in new framework rather than being a novel technology. This means that cloud computing inherited classical computing problems that are still challenging. Cloud computing security is considered one of the major problems, which require strong security systems to protect the system, and the valuable data stored and processed in it. Intrusion detection systems are one of the important security components and defence layer that detect cyber-attacks and malicious activities in cloud and non-cloud environments. However, there are some limitations such as attacks were detected at the time that the damage of the attack was already done. In recent years, cyber-attacks have increased rapidly in volume and diversity. In 2013, for example, over 552 million customers’ identities and crucial information were revealed through data breaches worldwide [3]. These growing threats are further demonstrated in the 50,000 daily attacks on the London Stock Exchange [4]. It has been predicted that the economic impact of cyber-attacks will cost the global economy $3 trillion on aggregate by 2020 [5]. This thesis focused on proposing an Intrusion Prediction System that is capable of sensing an attack before it happens in cloud or non-cloud environments. The proposed solution is based on assessing the host system vulnerabilities and monitoring the network traffic for attacks preparations. It has three main modules. The monitoring module observes the network for any intrusion preparations. This thesis proposes a new dynamic-selective statistical algorithm for detecting scan activities, which is part of reconnaissance that represents an essential step in network attack preparation. The proposed method performs a statistical selective analysis for network traffic searching for an attack or intrusion indications. This is achieved by exploring and applying different statistical and probabilistic methods that deal with scan detection. The second module of the prediction system is vulnerabilities assessment that evaluates the weaknesses and faults of the system and measures the probability of the system to fall victim to cyber-attack. Finally, the third module is the prediction module that combines the output of the two modules and performs risk assessments of the system security from intrusions prediction. The results of the conducted experiments showed that the suggested system outperforms the analogous methods in regards to performance of network scan detection, which means accordingly a significant improvement to the security of the targeted system. The scanning detection algorithm has achieved high detection accuracy with 0% false negative and 50% false positive. In term of performance, the detection algorithm consumed only 23% of the data needed for analysis compared to the best performed rival detection method

    Guiding Quality Assurance Through Context Aware Learning

    Get PDF
    Software Testing is a quality control activity that, in addition to finding flaws or bugs, provides confidence in the software’s correctness. The quality of the developed software depends on the strength of its test suite. Mutation Testing has shown that it effectively guides in improving the test suite’s strength. Mutation is a test adequacy criterion in which test requirements are represented by mutants. Mutants are slight syntactic modifications of the original program that aim to introduce semantic deviations (from the original program) necessitating the testers to design tests to kill these mutants, i.e., to distinguish the observable behavior between a mutant and the original program. This process of designing tests to kill a mutant is iteratively performed for the entire mutant set, which results in augmenting the test suite, hence improving its strength. Although mutation testing is empirically validated, a key issue is that its application is expensive due to the large number of low-utility mutants that it introduces. Some mutants cannot be even killed as they are functionally equivalent to the original program. To reduce the application cost, it is imperative to limit the number of mutants to those that are actually useful. Since it requires manual analysis and test executions to identify such mutants, there is a lack of an effective solution to the problem. Hence, it remains unclear how to mutate and test a code efficiently. On the other hand, with the advancement in deep learning, several works in the literature recently focused on using it on source code to automate many nontrivial tasks including bug fixing, producing code comments, code completion, and program repair. The increasing utilization of deep learning is due to a combination of factors. The first is the vast availability of data to learn from, specifically source code in open-source repositories. The second is the availability of inexpensive hardware able to efficiently run deep learning infrastructures. The third and the most compelling is its ability to automatically learn the categorization of data by learning the code context through its hidden layer architecture, making it especially proficient in identifying features. Thus, we explore the possibility of employing deep learning to identify only useful mutants, in order to achieve a good trade-off between the invested effort and test effectiveness. Hence, as our first contribution, this dissertation proposes Cerebro, a deep learning approach to statically select subsuming mutants based on the mutants’ surrounding code context. As subsuming mutants reside at the top of the subsumption hierarchy, test cases designed to only kill this minimal subset of mutants kill all the remaining mutants. Our evaluation of Cerebro demonstrates that it preserves the mutation testing benefits while limiting the application cost, i.e., reducing all cost factors such as equivalent mutants, mutant executions, and the mutants requiring analysis. Apart from improving test suite strength, mutation testing has been proven useful in inferring software specifications. Software specifications aim at describing the software’s intended behavior and can be used to distinguish correct from incorrect software behaviors. Specification inference techniques aim at inferring assertions by generating and filtering candidate assertions through dynamic test executions and mutation testing. Due to the introduction of a large number of mutants during mutation testing such techniques are also computationally expensive, hence establishing a need for the selection of mutants that fit best for assertion inference. We refer to such mutants as Assertion Inferring Mutants. In our analysis, we find that the assertion inferring mutants are significantly different from the subsuming mutants. Thus, we explored the employability of deep learning to identify Assertion Inferring Mutants. Hence, as our second contribution, this dissertation proposes Seeker, a deep learning approach to statically select Assertion Inferring Mutants. Our evaluation demonstrates that Seeker enables an assertion inference capability comparable to the full mutation analysis while significantly limiting the execution cost. In addition to testing software in general, a few works in the literature attempt to employ mutation testing to tackle security-related issues, due to the fault-based nature of the technique. These works propose mutation operators to convert non-vulnerable code to vulnerable by mimicking common security bugs. However, these pattern-based approaches have two major limitations. Firstly, the design of security-specific mutation operators is not trivial. It requires manual analysis and comprehension of the vulnerability classes. Secondly, these mutation operators can alter the program semantics in a manner that is not convincing for developers and is perceived as unrealistic, thereby hindering the usability of the method. On the other hand, with the release of powerful language models trained on large code corpus, e.g. CodeBERT, a new family of mutation testing tools has arisen with the promise to generate natural mutants. We study the extent to which the mutants produced by language models can semantically mimic the behavior of vulnerabilities aka Vulnerability-mimicking Mutants. Designed test cases failed by these mutants will also tackle mimicked vulnerabilities. In our analysis, we found that a very small subset of mutants is vulnerability-mimicking. Though, this set mimics more than half of the vulnerabilities in our dataset. Due to the absence of any defined features to identify vulnerability-mimicking mutants, as our third contribution, this dissertation introduces Mystique, a deep learning approach that automatically extracts features to identify vulnerability-mimicking mutants. Despite the scarcity, Mystique predicts vulnerability-mimicking mutants with a high prediction performance, demonstrating that their features can be automatically learned by deep learning models to statically predict these without the need of investing any effort in defining features. Since our vulnerability-mimicking mutants cannot mimic all the vulnerabilities, we perceive that these mutants are not a complete representation of all the vulnerabilities and there exists a need for actual vulnerability prediction approaches. Although there exist many such approaches in the literature, their performance is limited due to a few factors. Firstly, vulnerabilities are fewer in comparison to software bugs, limiting the information one can learn from, which affects the prediction performance. Secondly, the existing approaches learn on both, vulnerable, and supposedly non-vulnerable components. This introduces an unavoidable noise in training data, i.e., components with no reported vulnerability are considered non-vulnerable during training, and hence, results in existing approaches performing poorly. We employed deep learning to automatically capture features related to vulnerabilities and explored if we can avoid learning on supposedly non-vulnerable components. Hence, as our final contribution, this dissertation proposes TROVON, a deep learning approach that learns only on components known to be vulnerable, thereby making no assumptions and bypassing the key problem faced by previous techniques. Our comparison of TROVON with existing techniques on security-critical open-source systems with historical vulnerabilities reported in the National Vulnerability Database (NVD) demonstrates that its prediction capability significantly outperforms the existing techniques

    An Integrated Cybersecurity Risk Management (I-CSRM) Framework for Critical Infrastructure Protection

    Get PDF
    Risk management plays a vital role in tackling cyber threats within the Cyber-Physical System (CPS) for overall system resilience. It enables identifying critical assets, vulnerabilities, and threats and determining suitable proactive control measures to tackle the risks. However, due to the increased complexity of the CPS, cyber-attacks nowadays are more sophisticated and less predictable, which makes risk management task more challenging. This research aims for an effective Cyber Security Risk Management (CSRM) practice using assets criticality, predication of risk types and evaluating the effectiveness of existing controls. We follow a number of techniques for the proposed unified approach including fuzzy set theory for the asset criticality, machine learning classifiers for the risk predication and Comprehensive Assessment Model (CAM) for evaluating the effectiveness of the existing controls. The proposed approach considers relevant CSRM concepts such as threat actor attack pattern, Tactic, Technique and Procedure (TTP), controls and assets and maps these concepts with the VERIS community dataset (VCDB) features for the purpose of risk predication. Also, the tool serves as an additional component of the proposed framework that enables asset criticality, risk and control effectiveness calculation for a continuous risk assessment. Lastly, the thesis employs a case study to validate the proposed i-CSRM framework and i-CSRMT in terms of applicability. Stakeholder feedback is collected and evaluated using critical criteria such as ease of use, relevance, and usability. The analysis results illustrate the validity and acceptability of both the framework and tool for an effective risk management practice within a real-world environment. The experimental results reveal that using the fuzzy set theory in assessing assets' criticality, supports stakeholder for an effective risk management practice. Furthermore, the results have demonstrated the machine learning classifiers’ have shown exemplary performance in predicting different risk types including denial of service, cyber espionage, and Crimeware. An accurate prediction can help organisations model uncertainty with machine learning classifiers, detect frequent cyber-attacks, affected assets, risk types, and employ the necessary corrective actions for its mitigations. Lastly, to evaluate the effectiveness of the existing controls, the CAM approach is used, and the result shows that some controls such as network intrusion, authentication, and anti-virus show high efficacy in controlling or reducing risks. Evaluating control effectiveness helps organisations to know how effective the controls are in reducing or preventing any form of risk before an attack occurs. Also, organisations can implement new controls earlier. The main advantage of using the CAM approach is that the parameters used are objective, consistent and applicable to CPS

    Optimising a defence-aware threat modelling diagram incorporating a defence-in-depth approach for the internet-of-things

    Get PDF
    Modern technology has proliferated into just about every aspect of life while improving the quality of life. For instance, IoT technology has significantly improved over traditional systems, providing easy life, time-saving, financial saving, and security aspects. However, security weaknesses associated with IoT technology can pose a significant threat to the human factor. For instance, smart doorbells can make household life easier, save time, save money, and provide surveillance security. Nevertheless, the security weaknesses in smart doorbells could be exposed to a criminal and pose a danger to the life and money of the household. In addition, IoT technology is constantly advancing and expanding and rapidly becoming ubiquitous in modern society. In that case, increased usage and technological advancement create security weaknesses that attract cybercriminals looking to satisfy their agendas. Perfect security solutions do not exist in the real world because modern systems are continuously improving, and intruders frequently attempt various techniques to discover security flaws and bypass existing security control in modern systems. In that case, threat modelling is a great starting point in understanding the threat landscape of the system and its weaknesses. Therefore, the threat modelling field in computer science was significantly improved by implementing various frameworks to identify threats and address them to mitigate them. However, most mature threat modelling frameworks are implemented for traditional IT systems that only consider software-related weaknesses and do not address the physical attributes. This approach may not be practical for IoT technology because it inherits software and physical security weaknesses. However, scholars employed mature threat modelling frameworks such as STRIDE on IoT technology because mature frameworks still include security concepts that are significant for modern technology. Therefore, mature frameworks cannot be ignored but are not efficient in addressing the threat associated with modern systems. As a solution, this research study aims to extract the significant security concept of matured threat modelling frameworks and utilise them to implement robust IoT threat modelling frameworks. This study selected fifteen threat modelling frameworks from among researchers and the defence-in-depth security concept to extract threat modelling techniques. Subsequently, this research study conducted three independent reviews to discover valuable threat modelling concepts and their usefulness for IoT technology. The first study deduced that integration of threat modelling approach software-centric, asset-centric, attacker-centric and data-centric with defence-in-depth is valuable and delivers distinct benefits. As a result, PASTA and TRIKE demonstrated four threat modelling approaches based on a classification scheme. The second study deduced the features of a threat modelling framework that achieves a high satisfaction level toward defence-in-depth security architecture. Under evaluation criteria, the PASTA framework scored the highest satisfaction value. Finally, the third study deduced IoT systematic threat modelling techniques based on recent research studies. As a result, the STRIDE framework was identified as the most popular framework, and other frameworks demonstrated effective capabilities valuable to IoT technology. Respectively, this study introduced Defence-aware Threat Modelling (DATM), an IoT threat modelling framework based on the findings of threat modelling and defence-in-depth security concepts. The steps involved with the DATM framework are further described with figures for better understatement. Subsequently, a smart doorbell case study is considered for threat modelling using the DATM framework for validation. Furthermore, the outcome of the case study was further assessed with the findings of three research studies and validated the DATM framework. Moreover, the outcome of this thesis is helpful for researchers who want to conduct threat modelling in IoT environments and design a novel threat modelling framework suitable for IoT technology
    • …
    corecore