1 research outputs found

    Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation

    Get PDF
    [EN] Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from open source repositories linked to SonarCloud, an online tool that performs static analysis and tags the potentially vulnerable code. The tool provides a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms. This study presents a descriptive analysis of these files and overviews current status of C vulnerabilities, specifically buffer overflow, in the reviewed public repositoriesSIThis work has been partially funded by the Addendum no. 4 to the Universidad de León-Instituto Nacional de Ciberseguridad (INCIBE) Convention Framework on the “Detection of new threats and unknown patterns”, by the Consejería de Educación de la Junta de Castilla y León through the Project LE028P17 on the “Development of reusable software components based on machine learning for the cybersecurity of autonomous robots” and by the Ministerio de Ciencia, Innovación y Universidades through the Project RTI2018-100683-B-10
    corecore