Search CORE

18 research outputs found

SoK:Prudent Evaluation Practices for Fuzzing

Author: Ale-Ebrahim Arash
Bars Nils
Bernhard Lukas
Bissantz Nicolai
Crump Addison
Holz Thorsten
Muench Marius
Scharnowski Tobias
Schiller Nico
Schloegel Moritz
Publication venue: IEEE
Publication date: 23/05/2024
Field of study

Fuzzing has proven to be a highly effective approach to uncover software bugs over the past decade. After AFL popularized the groundbreaking concept of lightweight coverage feedback, the field of fuzzing has seen a vast amount of scientific work proposing new techniques, improving methodological aspects of existing strategies, or porting existing methods to new domains. All such work must demonstrate its merit by showing its applicability to a problem, measuring its performance, and often showing its superiority over existing works in a thorough, empirical evaluation. Yet, fuzzing is highly sensitive to its target, environment, and circumstances, e.g., randomness in the testing process. After all, relying on randomness is one of the core principles of fuzzing, governing many aspects of a fuzzer's behavior. Combined with the often highly difficult to control environment, the reproducibility of experiments is a crucial concern and requires a prudent evaluation setup. To address these threats to validity, several works, most notably Evaluating Fuzz Testing by Klees et al., have outlined how a carefully designed evaluation setup should be implemented, but it remains unknown to what extent their recommendations have been adopted in practice. In this work, we systematically analyze the evaluation of 150 fuzzing papers published at the top venues between 2018 and 2023. We study how existing guidelines are implemented and observe potential shortcomings and pitfalls. We find a surprising disregard of the existing guidelines regarding statistical tests and systematic errors in fuzzing evaluations. For example, when investigating reported bugs, we find that the search for vulnerabilities in real-world software leads to authors requesting and receiving CVEs of questionable quality. Extending our literature analysis to the practical domain, we attempt to reproduce claims of eight fuzzing papers. These case studies allow us to assess the practical reproducibility of fuzzing research and identify archetypal pitfalls in the evaluation design. Unfortunately, our reproduced results reveal several deficiencies in the studied papers, and we are unable to fully support and reproduce the respective claims. To help the field of fuzzing move toward a scientifically reproducible evaluation strategy, we propose updated guidelines for conducting a fuzzing evaluation that future work should follow

University of Birmingham Research Portal

Generating software tests to check for flaws and functionalities

Author: Araújo Francisco João Guimarães de Almeida
Publication venue
Publication date: 01/01/2019
Field of study

Tese de mestrado, Engenharia Informática (Engenharia de Software) Universidade de Lisboa, Faculdade de Ciências, 2019O rápido crescimento da complexidade de software unido com a grande necessidade de software no dia a dia causou uma exigência para testar os mesmos de modo a conseguir garantir um certo nível de qualidade, funcionamento e segurança. Por exemplo, tanto o carro que conduzimos hoje como o frigorifico que usamos para manter a temperatura desejada dos nossos alimentos, requer software de tal complexidade que quando postos sobre alto stress, poderiam apresentar algum tipo de bug. No caso desse bug ser uma vulnerabilidade, e, por conseguinte, poder ser explorada, seria capaz de por vidas em perigo e mesmo causar danos financeiros no valor de milhões de euros. Essa vulnerabilidade conseguiria, por exemplo, criar a hipótese ao atacante de tomar controlo do carro ou, no caso do frigorifico, aumentar a temperatura fazendo com que a comida se estrague. Não obstante a isso, depois de essas vulnerabilidades terem sido descobertas, é necessário iniciar um processo de correção do software, custando tempo e dinheiro. A complexidade do software cresce quando é necessário criar variantes das aplicações a partir de diversos componentes de software, como acontece em sistemas embebidos. Tal complexidade dificulta o teste e a validação do software para as funcionalidades que foi desenhado, podendo aumentar também o número vulnerabilidades de segurança. Estas vulnerabilidades podem permanecer ocultas durante vários anos em qualquer programa, independentemente de quantos testes foram executados para tentar assegurar a sua qualidade e segurança. Isto é tanto devido à eficiência destes testes que podem ser de uma qualidade limitada, bem como ao curto tempo disponível para garantir a correta funcionalidade. Um atacante externo, ao contrário, possui tempo teoricamente ilimitado para explorar o software quando este já se encontra no mercado. Vulnerabilidades são a principal causa de problemas de segurança e o foco principal quando os atacantes estão a tentar explorar o sistema. Estes, podem também causar diversos tipos de danos ao sistema e aos stockholders da aplicação, como por exemplo o dono da aplicação e os utilizadores. Uma distinção importante é que nem todos os bugs são vulnerabilidades. Uma vulnerabilidade tem de ser explorada de modo a possibilitar a corrupção do comportamento normal do programa, levando a um estado erróneo deste. De modo a conseguir tomar partido de um pedaço de software, os atacantes externos necessitam apenas de conseguir encontrar uma vulnerabilidade. No entanto, os testes desenvolvidos pelos responsáveis pela qualidade de segurança têm de encontrar inúmeros. Como resultado disto, hoje em dia as companhias gastam recursos em termos de custo e de tempo para conseguirem melhorar o processo de verificação e validação de software, por forma a tentar garantir o nível de qualidade e segurança desejado em qualquer dos seus produtos. No entanto, como acima referido, os recursos e tempo são limitados nos testes, fazendo com que vários bugs e vulnerabilidades possam não ser detetados por estes testes, mantendo-se ainda nos produtos finais. Embora já existam ferramentas automáticas de validação de segurança, não existe nenhuma ferramenta que possibilite a reutilização de resultados de testes entre versões de aplicações, de modo a validar estas versões e variantes da maneira mais eficiente possível. Validação de Software é o processo de assegurar um certo nível de confiança, que o software corresponde às espectativas e necessidades do utilizador e funciona como é suposto, não tendo nenhuma incoerência de comportamento e tendo o menor número de bugs possível. Neste contexto, cada teste examina o comportamento do software em teste de modo a verificar todas as condições mencionadas anteriormente e contribui para aumentar a confiança no sistema em si. Normalmente, esta verificação é feita com conhecimento à priori do programa a ser testado. Isto, no entanto, é um processo muito lento e pode ser sujeito a erros humanos e suposições sobre o programa a ser testado, especialmente se forem efetuadas pela mesma pessoa que fez o programa em si. Existem várias técnicas para testar software de maneira rápida, automática e eficiente, como por exemplo fuzzers. Fuzzing é uma técnica popular para encontrar bugs de software onde o sistema a ser testado é corrido com vários inputs semivalidos gerados pelo fuzzer, isto é, inputs certos o suficiente para correr no programa, mas que podem gerar erros. Enquanto o programa está a ser submetido a todos os testes, é monitorizado na espectativa de encontrar bugs que façam o programa crashar devido ao input dado. Inicialmente, os fuzzers não tinham em consideração o programa a ser testado, tratando-o como uma caixa preta, não tendo qualquer conhecimento sobre o seu código. Assim, o foco era apenas na geração rápida de inputs aleatórios e a monitorização desses inputs na execução do programa. No entanto, estes poderiam levar muito tempo para encontrar bugs somente atingíveis após certas condições logicas serem satisfeitas, as quais são pouco prováveis de ser ativadas com inputs aleatórios. A fim de resolver esse problema, um segundo tipo de fuzzers foi desenvolvido, whitebox fuzzers (fuzzers de caixa branca), que utilizam inputs de formato conhecido de modo a executar de maneira simbólica o programa a ser testado, guardando qualquer condição lógica que esteja no caminho de execução de um input, para depois as resolver uma a uma e criar novos inputs a partir das soluções dessas condições. No entanto, a execução simbólica é bastante lenta e guardar as condições todas leva a uma explosão de condições a serem resolvidas perdendo muito tempo nelas. De modo a resolver estes problemas com o whitebox fuzzers (fuzzers de caixa branca), foram criados greybox fuzzers, uma mistura dos dois tipos de fuzzer descritos anteriormente que usa instrumentação de baixo peso para ter uma ideia da estrutura do programa sem necessitar analise previa causando muito tempo nessa instrumentalização, mas compensado com a cobertura devolvida. No entanto, não existe nenhuma ferramenta, ou fuzzer, que consiga usufruir de informação obtida de testes realizados a versões mais antigas de um dado software para melhorar os resultados dos testes de uma versão do mesmo software mais recente. Hoje em dia, dois produtos que partilham funcionalidades implementadas de maneira semelhante ou mesmo igual irão ser testadas individualmente, repetindo assim todos os testes que já foram realizados no outro programa. Isto representa, claramente, uma falta de eficiência, perdendo tempo e dinheiro em repetições de testes, enquanto outras funcionalidades ainda não foram testadas, onde provavelmente podem existir vulnerabilidades que continuam por não ser descobertas. Este trabalho propõe uma abordagem que permite testar variantes ainda não testadas a partir de resultados das que já foram avaliadas. A abordagem foi implementada na ferramenta PandoraFuzzer, a qual tem por base a aplicação de fuzzing American Fuzzy Lop (AFL), e foi validada com um conjunto de programas de diferentes versões. Os resultados experimentais mostraram que a ferramenta melhora os resultados do AFL. A primeira etapa consiste na compreensão das várias vulnerabilidades comuns em programas desenvolvidos em C/C++ e os modos mais comuns de detetar e corrigir tais vulnerabilidades. A segunda etapa deste projeto é a implementação e validação da ferramenta. Esta ferramenta vai ser construída sobre um Fuzzer guiado por cobertura já existente, AFL, e segue um princípio semelhante. A terceira etapa deste projeto consiste na avaliação da ferramenta em si, usando várias medidas de comparação e foi validada com um conjunto de programas de diferentes versões. Os resultados experimentais mostraram que a ferramenta melhora os resultados do AFL.Industrial products, like vehicles and trains, integrate embedded systems implementing diverse and complicated functionalities. Such functionalities are programmable by software containing a multitude of parameters necessary for their configuration, which have been increasing due to the market diversification and customer demand. However, the increasing functionality and complexity of such systems make the validation and testing of the software highly complex. The complexity inherent to software nowadays has a direct relationship with the rising number of vulnerabilities found in the software itself due to the increased attack surface. A vulnerability is defined as a weakness in the application that if exploitable can cause serious damages and great financial impact. Products with such variability need to be tested adequately, looking for security flaws to guarantee public safety and quality assurance of the application. While efficient automated testing systems already exist, such as fuzzing, no tool is able to use results of a previous testable programme to more efficiently test the next piece of software that shares certain functionalities. The objective of this dissertation is to implement such a tool that can ignore already covered functionalities that have been seen and tested before in a previously tested program and give more importance to block codes that have yet to been tested, detect security vulnerabilities and to avoid repeating work when it is not necessary, hence increasing the speed and the coverage in the new program. The approach was implemented in a tool based on the American Fuzzy Lop (AFL) fuzzing application and was validated with a set of programs of different versions. The experimental results showed that the tool can perform better than AFL

Universidade de Lisboa: Repositório.UL

Software testing or the bugs’ nightmare

Author: Menéndez H.
Menéndez H.
Publication venue: Endless Science Ltd
Publication date: 01/01/2021
Field of study

Software development is not error-free. For decades, bugs –including physical ones– have become a significant development problem requiring major maintenance efforts. Even in some cases, solving bugs led to increment them. One of the main reasons for bug’s prominence is their ability to hide. Finding them is difficult and costly in terms of time and resources. However, software testing made significant progress identifying them by using different strategies that combine knowledge from every single part of the program. This paper humbly reviews some different approaches from software testing that discover bugs automatically and presents some different state-of-the-art methods and tools currently used in this area. It covers three testing strategies: search-based methods, symbolic execution, and fuzzers. It also provides some income about the application of diversity in these areas, and common and future challenges on automatic test generation that still need to be addressed

Middlesex University Research Repository

Recommended from our members

Greybox Fuzzing and Its Applications

Author: Rong Yuyang
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Reliable software is vital to society. Much effort has been spent to ensure the robustness and reliability of the software, including unit testing, model checking, static analysis, etc. However, these approaches do not scale well.Greybox fuzzing can test the software with little or no human intervention. A greybox fuzzer utilizes a mutator to automatically generate inputs to test the program. Unlike a random input generator, greybox fuzzer also monitors the program behavior to determine if the generated input triggers a new behavior. Inputs that trigger new behaviors are saved for future mutation. This monitoring is simple yet effective in practice. As a result, much work have focused on different parts of the fuzzer to improve its overall performance and applications. Despite its popularity, some aspects of greybox fuzzing and its applications have not been thoroughly studied. In this thesis, we cover three aspects of greybox fuzzing. First, many fuzzers aim to increase branch coverage. However, high branch coverage is only a sufficient condition for triggering bugs. We revisit some designs of the fuzzing process to increase the likelihood of finding bugs. We first design a tool called Integrity. Integrity sanitizes integer operations within the program, which are harder to spot compared with memory errors. Integrity has discovered eight new integer errors in open-source programs. While randomized fuzzers excel at increasing branch coverage, they struggle with solving predicates set by Integrity. To trigger bugs more effectively, we propose a deterministic fuzzer Valkyrie. Valkyrie uses principled approaches, such as gradient descent and compressed branch coverage, to eliminate the randomness in fuzzers while increasing throughput. Our evaluation shows that Valkyrie can find bugs faster than the state-of-the-art in many cases.Second, generic fuzzing is often less effective than specialized fuzzing. By incorporating expert knowledge into the fuzzer, a specialized fuzzer can reach deeply nested code more quickly. We select the LLVM backend as a test bed to see if a specialized strategy can find bugs in compilers. We develop IRFuzzer with a tailored mutation and monitoring method customized for the LLVM backend. We model LLVM intermediate representation (IR) so that IRFuzzer guarantees to generate valid input for the LLVM backend. IRFuzzer monitors matcher table coverage to track “behavior” in a more fine-grained manner with little overhead. IRFuzzer has found 78 new bugs in upstream LLVM, with 57 of them fixed, five of which have been backport to LLVM 15. These findings demonstrate that specialized fuzzing provides useful, actionable insights to LLVM developers. Finally, fuzzers generate large quantities of inputs as a byproduct, which are often discarded after the fuzzing process is completed. These inputs trigger different behaviors of the program. We notice that these behaviors can be vital for training large language models (LLMs). With this observation, we propose using source code coupled with their test cases for LLM training, where each test case is composed of a fuzzer-generated input and its corresponding output. We first build a dataset on top of an existing one by pairing test cases. Then, we develop methods to fine-tune a trained model and pretrain a new model on this dataset. With this new training scheme, wecontribute a new code understanding model, FuzzPretrain. Our evaluation shows that FuzzPretrain yielded more than 6%/19% mean average precision (mAP) improvements on code search over its baseline trained with only source code or abstract syntax trees (AST), respectively

eScholarship - University of California

Recommended from our members

Adaptive and Effective Fuzzing: a Data-Driven Approach

Author: She Dongdong
Publication venue
Publication date: 01/01/2023
Field of study

Security vulnerabilities have a large real-world impact, from ransomware attacks costing billions of dollars every year to sensitive data breaches in government, military and industry. Fuzzing is a popular technique to discover these vulnerabilities in an automated fashion. Industries have poured tons of resources into building large-scale fuzzing factories (e.g., Google’s ClusterFuzz and Microsoft’s OneFuzz) to test their products and make their product more secure. Despite the wide application of fuzzing in industry, there remain many issues constraining its performance. One fundamental limitation is the rule-based design in fuzzing. Rule-based fuzzers heavily rely on a set of static rules or heuristics. These fixed rules are summarized from human experience, hence failing to generalize on a diverse set of programs. In this dissertation, we present an adaptive and effective fuzzing framework in data-driven approach. A data-driven fuzzer makes decisions based on the analysis and reasoning of data rather than the static rules. Hence it is more adaptive, effective, and flexible than a typical rule-based fuzzer. More interestingly, the data-driven approach can bridge the connection from fuzzing to various data-centric domains (e.g., machine learning, optimizations and social network), enabling sophisticated designs in the fuzzing framework. A general fuzzing framework consists of two major components: seed scheduling and seed mutation. The seed scheduling module selects a seed from a seed corpus that includes multiple testcases. Then seed mutation module applies perturbation on the selected seed to generate a new testcase. First, we present Neuzz, the first machine learning (ML) based general-purpose fuzzer that adopts ML to seed mutation and greatly improves fuzzing performance. Then we present MTFuzz, a follow-up work of Neuzz by including diverse data into ML to generate effective seed mutations. In the end, we present K-Scheduler, a fuzzer-agnostic seed scheduling algorithm in data-driven approach. K-Scheduler leverages the graph data (i.e., inter-procedural control flow graph) and dynamic coverage data (i.e., code coverage bitmap) to construct a dynamic graph and schedule seeds by the graph centrality scores on that graph. It can significantly improve the fuzzing performance than the-state-of-art seed schedulers on various fuzzers widely-used in the industry

Columbia University Academic Commons

Ernst Denert Award for Software Engineering 2020

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2022
Field of study

This open access book provides an overview of the dissertations of the eleven nominees for the Ernst Denert Award for Software Engineering in 2020. The prize, kindly sponsored by the Gerlind & Ernst Denert Stiftung, is awarded for excellent work within the discipline of Software Engineering, which includes methods, tools and procedures for better and efficient development of high quality software. An essential requirement for the nominated work is its applicability and usability in industrial practice. The book contains eleven papers that describe the works by Jonathan Brachthäuser (EPFL Lausanne) entitled What You See Is What You Get: Practical Effect Handlers in Capability-Passing Style, Mojdeh Golagha’s (Fortiss, Munich) thesis How to Effectively Reduce Failure Analysis Time?, Nikolay Harutyunyan’s (FAU Erlangen-Nürnberg) work on Open Source Software Governance, Dominic Henze’s (TU Munich) research about Dynamically Scalable Fog Architectures, Anne Hess’s (Fraunhofer IESE, Kaiserslautern) work on Crossing Disciplinary Borders to Improve Requirements Communication, Istvan Koren’s (RWTH Aachen U) thesis DevOpsUse: A Community-Oriented Methodology for Societal Software Engineering, Yannic Noller’s (NU Singapore) work on Hybrid Differential Software Testing, Dominic Steinhofel’s (TU Darmstadt) thesis entitled Ever Change a Running System: Structured Software Reengineering Using Automatically Proven-Correct Transformation Rules, Peter Wägemann’s (FAU Erlangen-Nürnberg) work Static Worst-Case Analyses and Their Validation Techniques for Safety-Critical Systems, Michael von Wenckstern’s (RWTH Aachen U) research on Improving the Model-Based Systems Engineering Process, and Franz Zieris’s (FU Berlin) thesis on Understanding How Pair Programming Actually Works in Industry: Mechanisms, Patterns, and Dynamics – which actually won the award. The chapters describe key findings of the respective works, show their relevance and applicability to practice and industrial software engineering projects, and provide additional information and findings that have only been discovered afterwards, e.g. when applying the results in industry. This way, the book is not only interesting to other researchers, but also to industrial software professionals who would like to learn about the application of state-of-the-art methods in their daily work

Directory of Open Access Books (DOAB)

Evolutionary Computation 2020

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Intelligent optimization is based on the mechanism of computational intelligence to refine a suitable feature model, design an effective optimization algorithm, and then to obtain an optimal or satisfactory solution to a complex problem. Intelligent algorithms are key tools to ensure global optimization quality, fast optimization efficiency and robust optimization performance. Intelligent optimization algorithms have been studied by many researchers, leading to improvements in the performance of algorithms such as the evolutionary algorithm, whale optimization algorithm, differential evolution algorithm, and particle swarm optimization. Studies in this arena have also resulted in breakthroughs in solving complex problems including the green shop scheduling problem, the severe nonlinear problem in one-dimensional geodesic electromagnetic inversion, error and bug finding problem in software, the 0-1 backpack problem, traveler problem, and logistics distribution center siting problem. The editors are confident that this book can open a new avenue for further improvement and discoveries in the area of intelligent algorithms. The book is a valuable resource for researchers interested in understanding the principles and design of intelligent algorithms

Directory of Open Access Books (DOAB)

Ernst Denert Award for Software Engineering 2020

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

A Topological Model for Applications in Fuzzing

Author: Maggs Kelly
Publication venue
Publication date: 01/01/2021
Field of study

In this thesis, we introduce a topological model of dependencies motivated by applications in fuzzing. The model we define in Chapter 2 is framed within the language of finite topological spaces, partially ordered sets and simplicial complexes. In Chapter 3, we extend the theory to address situations where information is evolving dynamically and probabilistically, borrowing ideas from the general framework of persistence theory and Topological Data Analysis. In Chapter 4, we define algorithms to compute the relevant quantities and provide coarse bounds on their time complexity. Our model is applied to two problems in the fuzzing literature. Firstly, we reformulate the challenges of coverage-based grey-box fuzzing in Chapter 5 as a question about dependency relationships. We define a high-level schematic for the alteration of the popular afl algorithm and perform a case study that supports the use of our conceptual framework. Our second application is the analysis of call-stacks generated by a fuzzing campaign in Chapter 6. We examine the characteristics of a large data-set of call-stacks and use our model to define a recovery scheme they have obscured or incomplete information

The Australian National University