2,120 research outputs found

    Verifying Robustness of Gradient Boosted Models

    Full text link
    Gradient boosted models are a fundamental machine learning technique. Robustness to small perturbations of the input is an important quality measure for machine learning models, but the literature lacks a method to prove the robustness of gradient boosted models. This work introduces VeriGB, a tool for quantifying the robustness of gradient boosted models. VeriGB encodes the model and the robustness property as an SMT formula, which enables state of the art verification tools to prove the model's robustness. We extensively evaluate VeriGB on publicly available datasets and demonstrate a capability for verifying large models. Finally, we show that some model configurations tend to be inherently more robust than others

    Robustness Verification of Tree-based Models

    Full text link
    We study the robustness verification problem for tree-based models, including decision trees, random forests (RFs) and gradient boosted decision trees (GBDTs). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches find the minimal adversarial perturbation by a mixed integer linear programming (MILP) problem, which takes exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles, the verification problem can be cast as a max-clique problem on a multi-partite graph with bounded boxicity. For low dimensional problems when boxicity can be viewed as constant, this reformulation leads to a polynomial time algorithm. For general problems, by exploiting the boxicity of the graph, we develop an efficient multi-level verification algorithm that can give tight lower bounds on the robustness of decision tree ensembles, while allowing iterative improvement and any-time termination. OnRF/GBDT models trained on 10 datasets, our algorithm is hundreds of times faster than the previous approach that requires solving MILPs, and is able to give tight robustness verification bounds on large GBDTs with hundreds of deep trees.Comment: Hongge Chen and Huan Zhang contributed equall

    A Review of Formal Methods applied to Machine Learning

    Full text link
    We review state-of-the-art formal methods applied to the emerging field of the verification of machine learning systems. Formal methods can provide rigorous correctness guarantees on hardware and software systems. Thanks to the availability of mature tools, their use is well established in the industry, and in particular to check safety-critical applications as they undergo a stringent certification process. As machine learning is becoming more popular, machine-learned components are now considered for inclusion in critical systems. This raises the question of their safety and their verification. Yet, established formal methods are limited to classic, i.e. non machine-learned software. Applying formal methods to verify systems that include machine learning has only been considered recently and poses novel challenges in soundness, precision, and scalability. We first recall established formal methods and their current use in an exemplar safety-critical field, avionic software, with a focus on abstract interpretation based techniques as they provide a high level of scalability. This provides a golden standard and sets high expectations for machine learning verification. We then provide a comprehensive and detailed review of the formal methods developed so far for machine learning, highlighting their strengths and limitations. The large majority of them verify trained neural networks and employ either SMT, optimization, or abstract interpretation techniques. We also discuss methods for support vector machines and decision tree ensembles, as well as methods targeting training and data preparation, which are critical but often neglected aspects of machine learning. Finally, we offer perspectives for future research directions towards the formal verification of machine learning systems

    Verifiable Boosted Tree Ensembles

    Full text link
    Verifiable learning advocates for training machine learning models amenable to efficient security verification. Prior research demonstrated that specific classes of decision tree ensembles -- called large-spread ensembles -- allow for robustness verification in polynomial time against any norm-based attacker. This study expands prior work on verifiable learning from basic ensemble methods (i.e., hard majority voting) to advanced boosted tree ensembles, such as those trained using XGBoost or LightGBM. Our formal results indicate that robustness verification is achievable in polynomial time when considering attackers based on the L∞L_\infty-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the LpL_p-norm for any p∈NâˆȘ{0}p \in \mathbb{N} \cup \{0\}, which in practice grants excellent performance. Our experimental evaluation shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification.Comment: 15 pages, 3 figure

    Genetic Adversarial Training of Decision Trees

    Full text link
    We put forward a novel learning methodology for ensembles of decision trees based on a genetic algorithm which is able to train a decision tree for maximizing both its accuracy and its robustness to adversarial perturbations. This learning algorithm internally leverages a complete formal verification technique for robustness properties of decision trees based on abstract interpretation, a well known static program analysis technique. We implemented this genetic adversarial training algorithm in a tool called Meta-Silvae (MS) and we experimentally evaluated it on some reference datasets used in adversarial training. The experimental results show that MS is able to train robust models that compete with and often improve on the current state-of-the-art of adversarial training of decision trees while being much more compact and therefore interpretable and efficient tree models

    Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse

    Full text link
    As machine learning models are increasingly being employed to make consequential decisions in real-world settings, it becomes critical to ensure that individuals who are adversely impacted (e.g., loan denied) by the predictions of these models are provided with a means for recourse. While several approaches have been proposed to construct recourses for affected individuals, the recourses output by these methods either achieve low costs (i.e., ease-of-implementation) or robustness to small perturbations (i.e., noisy implementations of recourses), but not both due to the inherent trade-offs between the recourse costs and robustness. Furthermore, prior approaches do not provide end users with any agency over navigating the aforementioned trade-offs. In this work, we address the above challenges by proposing the first algorithmic framework which enables users to effectively manage the recourse cost vs. robustness trade-offs. More specifically, our framework Probabilistically ROBust rEcourse (\texttt{PROBE}) lets users choose the probability with which a recourse could get invalidated (recourse invalidation rate) if small changes are made to the recourse i.e., the recourse is implemented somewhat noisily. To this end, we propose a novel objective function which simultaneously minimizes the gap between the achieved (resulting) and desired recourse invalidation rates, minimizes recourse costs, and also ensures that the resulting recourse achieves a positive model prediction. We develop novel theoretical results to characterize the recourse invalidation rates corresponding to any given instance w.r.t. different classes of underlying models (e.g., linear models, tree based models etc.), and leverage these results to efficiently optimize the proposed objective. Experimental evaluation with multiple real world datasets demonstrate the efficacy of the proposed framework
    • 

    corecore