7 research outputs found

    A Machine Learning-Driven Evolutionary Approach for Testing Web Application Firewalls

    Get PDF
    Web application firewalls (WAF) are an essential protection mechanism for online software systems. Because of the relentless flow of new kinds of attacks as well as their increased sophistication, WAFs have to be updated and tested regularly to prevent attackers from easily circumventing them. In this paper, we focus on testing WAFs for SQL injection attacks, but the general principles and strategy we propose can be adapted to other contexts. We present ML-Driven, an approach based on machine learning and an evolutionary algorithm to automatically detect holes in WAFs that let SQL injection attacks bypass them. Initially, ML-Driven automatically generates a diverse set of attacks and submit them to the system being protected by the target WAF. Then, ML-Driven selects attacks that exhibit patterns (substrings) associated with bypassing the WAF and evolve them to generate new successful bypassing attacks. Machine learning is used to incrementally learn attack patterns from previously generated attacks according to their testing results, i.e., if they are blocked or bypass the WAF. We implemented ML-Driven in a tool and evaluated it on ModSecurity, a widely used open-source WAF, and a proprietary WAF protecting a financial institution. Our empirical results indicate that ML-Driven is effective and efficient at generating SQL injection attacks bypassing WAFs and identifying attack patterns

    Creating Synthetic Attacks with Evolutionary Algorithms for Proactive Defense of Industrial Control Systems

    Get PDF
    Industrial control systems (ICS) play an important role in critical infrastructure. Cybersecurity defenders can use honeypots (decoy systems) to capture and study malicious ICS traffic. A problem with existing ICS honeypots is their low interactivity, causing intruders to quickly abandon the attack attempts. This research aims to improve ICS honeypots by feeding them realistic artificially generated packets and examining their behavior to proactively identify functional gaps in defenses. Our synthetic attack generator (SAGO) uses an evolutionary algorithm on known attack traffic to create new variants of Log4j exploits (CVE-2021-44228) and Industroyer2 malware. We tested over 5,200 and 256 unique Log4j and IEC 104 variations respectively, with success rates up to 70 percent for Log4j and 40 percent for IEC 104. We identified improvements to our honeypot’s interactivity based on its responses to these attacks. Our technique can aid defenders in hardening perimeter protection against new attack variants

    BanditFuzz: Fuzzing SMT Solvers with Reinforcement Learning

    Get PDF
    Satisfiability Modulo Theories (SMT) solvers are fundamental tools in the broad context of software engineering and security research. If SMT solvers are to continue to have an impact, it is imperative we develop efficient and systematic testing methods for them. To this end, we present a reinforcement learning driven fuzzing system BanditFuzz that zeroes in on the grammatical constructs of well-formed solver inputs that are the root cause of performance or correctness issues in solvers-under-test. To the best of our knowledge, BanditFuzz is the first machine-learning based fuzzer for SMT solvers. BanditFuzz takes as input a grammar G describing the well-formed inputs to a set of distinct solvers (say, P_1 and P_2) that implement the same specification and a fuzzing objective (e.g., maximize the relative performance difference between P_1 and P_2), and outputs a ranked list of grammatical constructs that are likely to maximize performance differences between P_1 and P_2 or are root causes of errors in these solvers. Typically, mutation fuzzing is implemented as a set of random mutations applied to a given input. By contrast, the key innovation behind BanditFuzz is the modeling of a grammar-preserving fuzzing mutator as a reinforcement learning (RL) agent that, via blackbox interactions with programs-under-test, learns which grammatical constructs are most likely the cause of an error or performance issue. Using BanditFuzz, we discovered 1700 syntactically unique inputs resulting in inconsistent answers across state-of-the-art SMT solvers Z3, CVC4, Colibri, MathSAT, and Z3str3 over the floating-point and string SMT theories. Further, using BanditFuzz, we constructed two benchmark suites (with 400 floating-point and 110 string instances) that expose performance issues in all considered solvers. We also performed a comparison of BanditFuzz against random, mutation, and evolutionary fuzzing methods. We observed up to a 31% improvement in performance fuzzing and up to 81% improvement in the number of bugs found by BanditFuzz relative to these other methods for the same amount of time provided to all methods

    CREATING SYNTHETIC ATTACKS WITH EVOLUTIONARY ALGORITHMS FOR INDUSTRIAL-CONTROL-SYSTEM SECURITY TESTING

    Get PDF
    Cybersecurity defenders can use honeypots (decoy systems) to capture and study adversarial activities. An issue with honeypots is obtaining enough data on rare attacks. To improve data collection, we created a tool that uses machine learning to generate plausible artificial attacks on two protocols, Hypertext Transfer Protocol (HTTP) and IEC 60870-5-104 (“IEC 104” for short, an industrial-control-system protocol). It uses evolutionary algorithms to create new variants of two cyberattacks: Log4j exploits (described in CVE-2021-44228 as severely critical) and the Industroyer2 malware (allegedly used in Russian attacks on Ukrainian power grids). Our synthetic attack generator (SAGO) effectively created synthetic attacks at success rates up to 70 and 40 percent for Log4j and IEC 104, respectively. We tested over 5,200 unique variations of Log4j exploits and 256 unique variations of the approach used by Industroyer2. Based on a power-grid honeypot’s response to these attacks, we identified changes to improve interactivity, which should entice intruders to mount more revealing attacks and aid defenders in hardening against new attack variants. This work provides a technique to proactively identify cybersecurity weaknesses in critical infrastructure and Department of Defense assets.Captain, United States Marine CorpsApproved for public release. Distribution is unlimited

    A Machine-Learning-Driven Evolutionary Approach for Testing Web Application Firewalls

    No full text

    A pattern-driven corpus to predictive analytics in mitigating SQL injection attack

    Get PDF
    The back-end database provides accessible and structured storage for each web application’s big data internet web traffic exchanges stemming from cloud-hosted web applications to the Internet of Things (IoT) smart devices in emerging computing. Structured Query Language Injection Attack (SQLIA) remains an intruder’s exploit of choice to steal confidential information from the database of vulnerable front-end web applications with potentially damaging security ramifications.Existing solutions to SQLIA still follows the on-premise web applications server hosting concept which were primarily developed before the recent challenges of the big data mining and as such lack the functionality and ability to cope with new attack signatures concealed in a large volume of web requests. Also, most organisations’ databases and services infrastructure no longer reside on-premise as internet cloud-hosted applications and services are increasingly used which limit existing Structured Query Language Injection (SQLI) detection and prevention approaches that rely on source code scanning. A bio-inspired approach such as Machine Learning (ML) predictive analytics provides functional and scalable mining for big data in the detection and prevention of SQLI in intercepting large volumes of web requests. Unfortunately, lack of availability of robust ready-made data set with patterns and historical data items to train a classifier are issues well known in SQLIA research applying ML in the field of Artificial Intelligence (AI). The purpose-built competition-driven test case data sets are antiquated and not pattern-driven to train a classifier for real-world application. Also, the web application types are so diverse to have an all-purpose generic data set for ML SQLIA mitigation.This thesis addresses the lack of pattern-driven data set by deriving one to predict SQLIA of any size and proposing a technique to obtain a data set on the fly and break the circle of relying on few outdated competitions-driven data sets which exist are not meant to benchmark real-world SQLIA mitigation. The thesis in its contributions derived pattern-driven data set of related member strings that are used in training a supervised learning model with validation through Receiver Operating Characteristic (ROC) curve and Confusion Matrix (CM) with results of low false positives and negatives. We further the evaluations with cross-validation to have obtained a low variance in accuracy that indicates of a successful trained model using the derived pattern-driven data set capable of generalisation of unknown data in the real-world with reduced biases. Also, we demonstrated a proof of concept with a test application by implementing an ML Predictive Analytics to SQLIA detection and prevention using this pattern-driven data set in a test web application. We observed in the experiments carried out in the course of this thesis, a data set of related member strings can be generated from a web expected input data and SQL tokens, including known SQLI signatures. The data set extraction ontology proposed in this thesis for applied ML in SQLIA mitigation in the context of emerging computing of big data internet, and cloud-hosted services set our proposal apart from existing approaches that were mostly on-premise source code scanning and queries structure comparisons of some sort
    corecore