485 research outputs found

    A Survey on Automated Software Vulnerability Detection Using Machine Learning and Deep Learning

    Full text link
    Software vulnerability detection is critical in software security because it identifies potential bugs in software systems, enabling immediate remediation and mitigation measures to be implemented before they may be exploited. Automatic vulnerability identification is important because it can evaluate large codebases more efficiently than manual code auditing. Many Machine Learning (ML) and Deep Learning (DL) based models for detecting vulnerabilities in source code have been presented in recent years. However, a survey that summarises, classifies, and analyses the application of ML/DL models for vulnerability detection is missing. It may be difficult to discover gaps in existing research and potential for future improvement without a comprehensive survey. This could result in essential areas of research being overlooked or under-represented, leading to a skewed understanding of the state of the art in vulnerability detection. This work address that gap by presenting a systematic survey to characterize various features of ML/DL-based source code level software vulnerability detection approaches via five primary research questions (RQs). Specifically, our RQ1 examines the trend of publications that leverage ML/DL for vulnerability detection, including the evolution of research and the distribution of publication venues. RQ2 describes vulnerability datasets used by existing ML/DL-based models, including their sources, types, and representations, as well as analyses of the embedding techniques used by these approaches. RQ3 explores the model architectures and design assumptions of ML/DL-based vulnerability detection approaches. RQ4 summarises the type and frequency of vulnerabilities that are covered by existing studies. Lastly, RQ5 presents a list of current challenges to be researched and an outline of a potential research roadmap that highlights crucial opportunities for future work

    Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation

    Get PDF
    [EN] Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from open source repositories linked to SonarCloud, an online tool that performs static analysis and tags the potentially vulnerable code. The tool provides a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms. This study presents a descriptive analysis of these files and overviews current status of C vulnerabilities, specifically buffer overflow, in the reviewed public repositoriesSIThis work has been partially funded by the Addendum no. 4 to the Universidad de León-Instituto Nacional de Ciberseguridad (INCIBE) Convention Framework on the “Detection of new threats and unknown patterns”, by the Consejería de Educación de la Junta de Castilla y León through the Project LE028P17 on the “Development of reusable software components based on machine learning for the cybersecurity of autonomous robots” and by the Ministerio de Ciencia, Innovación y Universidades through the Project RTI2018-100683-B-10

    Defense against buffer overflow attack by software design diversity

    Full text link
    A buffer overflow occurs during program execution when a fixed-size buffer has had too much data copied into it. This causes the data to overwrite into adjacent memory locations, and, depending on what is stored there, the behavior of the program itself might be affected; Attackers can select the value to place in the location in order to redirect execution to the location of their choice. If it contains machine code, the attacker causes the program to execute any arbitrary set of instructions---essentially taking control of the process. Successfully modifying the function return address allows the attacker to execute instructions with the same privileges as that of the attacked program; In this thesis, we propose to design software with multiple variants of the modules/functions. It can provide strong defense against the buffer overflow attack. A way can be provided to select a particular variant (implementation) of the module randomly when software is executed. This proves to be useful when an attacker designs the attack for a particular variant/implementation which may not be chosen in the random selection process during execution. It would be much difficult for the attacker to design an attack because of the different memory (stack-frame) layout the software could have every time it is executed

    Towards the automation of vulnerability detection in source code

    Get PDF
    Software vulnerability detection, which involves security property specification and verification, is essential in assuring the software security. However, the process of vulnerability detection is labor-intensive, time-consuming and error-prone if done manually. In this thesis, we present a hybrid approach, which utilizes the power of static and dynamic analysis for performing vulnerability detection in a systematic way. The key contributions of this thesis are threefold. first, a vulnerability detection framework, which supports security property specification, potential vulnerability detection, and dynamic verification, is proposed. Second, an investigation of test data generation for dynamic verification is conducted. Third, the concept of reducing security property verification to reachability is introduced

    Detecting non-secure memory deallocation with CBMC

    Get PDF
    2021 Fall.Includes bibliographical references.Scrubbing sensitive data before releasing memory is a widely recommended but often ignored programming practice for developing secure software. Consequently, sensitive data such as cryptographic keys, passwords, and personal data, can remain in memory indefinitely, thereby increasing the risk of exposure to hackers who can retrieve the data using memory dumps or exploit vulnerabilities such as Heartbleed and Etherleak. We propose an approach for detecting a specific memory safety bug called Improper Clearing of Heap Memory Before Release, referred to as Common Weakness Enumeration 244. The CWE-244 bug in a program allows the leakage of confidential information when a variable is not wiped before heap memory is freed. Our approach uses the CBMC model checker to detect this weakness and is based on instrumenting the program using (1) global variable declarations that track and monitor the state of the program variables relevant for CWE-244, and (2) assertions that help CBMC to detect unscrubbed memory. We develop a tool, SecMD-Checker, implementing our instrumentation based algorithm, and we provide experimental validation on the Juliet Test Suite that the tool is able to detect all the CWE-244 instances present in the test suite. The proposed approach has the potential to work with other model checkers and can be extended for detecting other weaknesses that require variable tracking and monitoring, such as CWE-226, CWE-319, and CWE-1239

    Towards A Verified Complex Protocol Stack in a Production Kernel: Methodology and Demonstration

    Get PDF
    Any useful computer system performs communication and any communication must be parsed before it is computed upon. Given their importance, one might expect parsers to receive a significant share of attention from the security community. This is, however, not the case: bugs in parsers continue to account for a surprising portion of reported and exploited vulnerabilities. In this thesis, I propose a methodology for supporting the development of software that depends on parsers---such as anything connected to the Internet---to safely support any reasonably designed protocol: data structures to describe protocol messages; validation routines that check that data received from the wire conforms to the rules of the protocol; systems that allow a defender to inject arbitrary, crafted input so as to explore the effectiveness of the parser; and systems that allow for the observation of the parser code while it is being explored. Then, I describe principled method of producing parsers that automatically generates the myriad parser-related software from a description of the protocol. This has many significant benefits: it makes implementing parsers simpler, easier, and faster; it reduces the trusted computing base to the description of the protocol and the program that compiles the description to runnable code; and it allows for easier formal verification of the generated code. I demonstrate the merits of the proposed methodology by creating a description of the USB protocol using a domain-specific language (DSL) embedded in Haskell and integrating it with the FreeBSD operating system. Using the industry-standard umap test-suite, I measure the performance and efficacy of the generated parser. I show that it is stable, that it is effective at protecting a system from both accidentally and maliciously malformed input, and that it does not incur unreasonable overhead

    Intrusion detection and management over the world wide web

    Get PDF
    As the Internet and society become ever more integrated so the number of Internet users continues to grow. Today there are 1.6 billion Internet users. They use its services to work from home, shop for gifts, socialise with friends, research the family holiday and manage their finances. Through generating both wealth and employment the Internet and our economies have also become interwoven. The growth of the Internet has attracted hackers and organised criminals. Users are targeted for financial gain through malware and social engineering attacks. Industry has responded to the growing threat by developing a range defences: antivirus software, firewalls and intrusion detection systems are all readily available. Yet the Internet security problem continues to grow and Internet crime continues to thrive. Warnings on the latest application vulnerabilities, phishing scams and malware epidemics are announced regularly and serve to heighten user anxiety. Not only are users targeted for attack but so too are businesses, corporations, public utilities and even states. Implementing network security remains an error prone task for the modern Internet user. In response this thesis explores whether intrusion detection and management can be effectively offered as a web service to users in order to better protect them and heighten their awareness of the Internet security threat

    Coding policies for secure web applications

    Get PDF
    corecore