25 research outputs found
Static analysis-based approaches for secure software development
Software security is a matter of major concern for software development enterprises that wish to deliver highly secure software products to their customers. Static analysis is considered one of the most effective mechanisms for adding security to software products. The multitude of static analysis tools that are available provide a large number of raw results that may contain security-relevant information, which may be useful for the production of secure software. Several mechanisms that can facilitate the production of both secure and reliable software applications have been proposed over the years. In this paper, two such mechanisms, particularly the vulnerability prediction models (VPMs) and the optimum checkpoint recommendation (OCR) mechanisms, are theoretically examined, while their potential improvement by using static analysis is also investigated. In particular, we review the most significant contributions regarding these mechanisms, identify their most important open issues, and propose directions for future research, emphasizing on the potential adoption of static analysis for addressing the identified open issues. Hence, this paper can act as a reference for researchers that wish to contribute in these subfields, in order to gain solid understanding of the existing solutions and their open issues that require further research
Wasmizer: Curating WebAssembly-driven Projects on GitHub
WebAssembly has attracted great attention as a portable compilation target
for programming languages. To facilitate in-depth studies about this
technology, we have deployed Wasmizer, a tool that regularly mines GitHub
projects and makes an up-to-date dataset of WebAssembly sources and their
binaries publicly available. Presently, we have collected 2 540 C and C++
projects that are highly-related to WebAssembly, and built a dataset of 8 915
binaries that are linked to their source projects. To demonstrate an
application of this dataset, we have investigated the presence of eight
WebAssembly compilation smells in the wild.Comment: 11 pages + 1 page of references Preprint of MSR'23 publicatio
The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification
This paper presents the FormAI dataset, a large collection of 112, 000
AI-generated compilable and independent C programs with vulnerability
classification. We introduce a dynamic zero-shot prompting technique
constructed to spawn diverse programs utilizing Large Language Models (LLMs).
The dataset is generated by GPT-3.5-turbo and comprises programs with varying
levels of complexity. Some programs handle complicated tasks like network
management, table games, or encryption, while others deal with simpler tasks
like string manipulation. Every program is labeled with the vulnerabilities
found within the source code, indicating the type, line number, and vulnerable
function name. This is accomplished by employing a formal verification method
using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model
checking, abstract interpretation, constraint programming, and satisfiability
modulo theories to reason over safety/security properties in programs. This
approach definitively detects vulnerabilities and offers a formal model known
as a counterexample, thus eliminating the possibility of generating false
positive reports. We have associated the identified vulnerabilities with Common
Weakness Enumeration (CWE) numbers. We make the source code available for the
112, 000 programs, accompanied by a separate file containing the
vulnerabilities detected in each program, making the dataset ideal for training
LLMs and machine learning algorithms. Our study unveiled that according to
ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities,
thereby presenting considerable risks to software safety and security.Comment: https://github.com/FormAI-Datase
Deep security analysis of program code: a systematic literature review
Due to the continuous digitalization of our society, distributed and web-based applications become omnipresent and making them more secure gains paramount relevance. Deep learning (DL) and its representation learning approach are increasingly been proposed for program code analysis potentially providing a powerful means in making software systems less vulnerable. This systematic literature review (SLR) is aiming for a thorough analysis and comparison of 32 primary studies on DL-based vulnerability analysis of program code. We found a rich variety of proposed analysis approaches, code embeddings and network topologies. We discuss these techniques and alternatives in detail. By compiling commonalities and differences in the approaches, we identify the current state of research in this area and discuss future directions. We also provide an overview of publicly available datasets in order to foster a stronger benchmarking of approaches. This SLR provides an overview and starting point for researchers interested in deep vulnerability analysis on program code