13 research outputs found

    Efficient Windows Application Fuzzing with Fork-server

    Get PDF
    Fuzzing is an effective technique for automatically uncovering bugs in software. Since it was introduced, it has found thousands of vulnerabilities. Nowadays, fuzzing is an indispensable tool in security researchers' arsenal. Unfortunately, most fuzzing research has been concentrated on Linux systems, and Windows fuzzing has been largely neglected by the fuzzing community. Windows systems still represent a large market share of desktop computers, and as they are end-user systems, they are valuable targets to attackers. Windows fuzzing is still difficult-to-setup, slow, and generally troublesome. There exists a chicken-egg problem: because Windows fuzzing is challenging, little effort is invested in it; yet, because little effort is invested, Windows fuzzing remains challenging. We aim to break this cycle by attacking one of the root problems blocking easy and effective Windows fuzzing. A key difference between Linux and Windows systems for fuzzing is the lack of a fork() functionality on Windows systems. Without a suitable fork() API, a fuzzer cannot quickly and reliably clone processes, an operation that fuzzing relies heavily upon. Existing Windows fuzzers such as WinAFL rely on persistent-mode fuzzing as a work-around for the lack of fast process cloning, unlike Linux fuzzers which rely on a fork-server. In this work, we developed a fork() implementation that provides the necessary fast process cloning machinery and built a working fork-server on top of it. We integrated this fork-server into WinAFL, and applied several other key improvements and insights to bypass the difficulties of fuzzing typical Windows applications. In our evaluation, we ran our fuzzer against 59 fuzzing harnesses for 37 applications, and found 61 new bugs. Comparing the performance of our fork() implementation against other similar APIs on Windows, we found that our implementation was the most suitable and efficient. We believe that this marks the first Windows fork implementation suitable for fuzzing.Undergraduat

    BanditFuzz: Fuzzing SMT Solvers with Reinforcement Learning

    Get PDF
    Satisfiability Modulo Theories (SMT) solvers are fundamental tools in the broad context of software engineering and security research. If SMT solvers are to continue to have an impact, it is imperative we develop efficient and systematic testing methods for them. To this end, we present a reinforcement learning driven fuzzing system BanditFuzz that zeroes in on the grammatical constructs of well-formed solver inputs that are the root cause of performance or correctness issues in solvers-under-test. To the best of our knowledge, BanditFuzz is the first machine-learning based fuzzer for SMT solvers. BanditFuzz takes as input a grammar G describing the well-formed inputs to a set of distinct solvers (say, P_1 and P_2) that implement the same specification and a fuzzing objective (e.g., maximize the relative performance difference between P_1 and P_2), and outputs a ranked list of grammatical constructs that are likely to maximize performance differences between P_1 and P_2 or are root causes of errors in these solvers. Typically, mutation fuzzing is implemented as a set of random mutations applied to a given input. By contrast, the key innovation behind BanditFuzz is the modeling of a grammar-preserving fuzzing mutator as a reinforcement learning (RL) agent that, via blackbox interactions with programs-under-test, learns which grammatical constructs are most likely the cause of an error or performance issue. Using BanditFuzz, we discovered 1700 syntactically unique inputs resulting in inconsistent answers across state-of-the-art SMT solvers Z3, CVC4, Colibri, MathSAT, and Z3str3 over the floating-point and string SMT theories. Further, using BanditFuzz, we constructed two benchmark suites (with 400 floating-point and 110 string instances) that expose performance issues in all considered solvers. We also performed a comparison of BanditFuzz against random, mutation, and evolutionary fuzzing methods. We observed up to a 31% improvement in performance fuzzing and up to 81% improvement in the number of bugs found by BanditFuzz relative to these other methods for the same amount of time provided to all methods

    A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

    Full text link
    Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.Comment: Accepted for publication at EMSE (Empirical Software Engineering journal) 202
    corecore