Search CORE

7 research outputs found

Formal Verification of Input-Output Mappings of Tree Ensembles

Author: Nadjm-Tehrani Simin
Törnblom John
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Recent advances in machine learning and artificial intelligence are now being considered in safety-critical autonomous systems where software defects may cause severe harm to humans and the environment. Design organizations in these domains are currently unable to provide convincing arguments that their systems are safe to operate when machine learning algorithms are used to implement their software. In this paper, we present an efficient method to extract equivalence classes from decision trees and tree ensembles, and to formally verify that their input-output mappings comply with requirements. The idea is that, given that safety requirements can be traced to desirable properties on system input-output patterns, we can use positive verification outcomes in safety arguments. This paper presents the implementation of the method in the tool VoTE (Verifier of Tree Ensembles), and evaluates its scalability on two case studies presented in current literature. We demonstrate that our method is practical for tree ensembles trained on low-dimensional data with up to 25 decision trees and tree depths of up to 20. Our work also studies the limitations of the method with high-dimensional data and preliminarily investigates the trade-off between large number of trees and time taken for verification

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Genetic Adversarial Training of Decision Trees

Author: Ranzato Francesco
Zanella Marco
Publication venue
Publication date: 21/12/2020
Field of study

We put forward a novel learning methodology for ensembles of decision trees based on a genetic algorithm which is able to train a decision tree for maximizing both its accuracy and its robustness to adversarial perturbations. This learning algorithm internally leverages a complete formal verification technique for robustness properties of decision trees based on abstract interpretation, a well known static program analysis technique. We implemented this genetic adversarial training algorithm in a tool called Meta-Silvae (MS) and we experimentally evaluated it on some reference datasets used in adversarial training. The experimental results show that MS is able to train robust models that compete with and often improve on the current state-of-the-art of adversarial training of decision trees while being much more compact and therefore interpretable and efficient tree models

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Verifiable Learning for Robust Tree Ensembles

Author: Calzavara S.
Cazzaro L.
Pibiri G. E.
Prezza N.
Publication venue: Association for Computing Machinery, Inc
Publication date: 01/01/2023
Field of study

Verifying the robustness of machine learning models against evasion attacks at test time is an important research problem. Unfortunately, prior work established that this problem is NP-hard for decision tree ensembles, hence bound to be intractable for specific inputs. In this paper, we identify a restricted class of decision tree ensembles, called large-spread ensembles, which admit a security verification algorithm running in polynomial time. We then propose a new approach called verifiable learning, which advocates the training of such restricted model classes which are amenable for efficient verification. We show the benefits of this idea by designing a new training algorithm that automatically learns a large-spread decision tree ensemble from labelled data, thus enabling its security verification in polynomial time. Experimental results on public datasets confirm that large-spread ensembles trained using our algorithm can be verified in a matter of seconds, using standard commercial hardware. Moreover, large-spread ensembles are more robust than traditional ensembles against evasion attacks, at the cost of an acceptable loss of accuracy in the non-adversarial setting

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Improving Machine Learning Pipeline Creation using Visual Programming and Static Analysis

Author: David João Pedro Vieira
Publication venue
Publication date: 01/01/2021
Field of study

Tese de mestrado, Engenharia Informática (Engenharia de Software), Universidade de Lisboa, Faculdade de Ciências, 2021ML pipelines are composed of several steps that load data, clean it, process it, apply learning algorithms and produce either reports or deploy inference systems into production. In real-world scenarios, pipelines can take days, weeks, or months to train with large quantities of data. Unfortunately, current tools to design and orchestrate ML pipelines are oblivious to the semantics of each step, allowing developers to easily introduce errors when connecting two components that might not work together, either syntactically or semantically. Data scientists and engineers often find these bugs during or after the lengthy execution, which decreases their productivity. We propose a Visual Programming Language (VPL) enriched with semantic constraints regarding the behavior of each component and a verification methodology that verifies entire pipelines to detect common ML bugs that existing visual and textual programming languages do not. We evaluate this methodology on a set of six bugs taken from a data science company focused on preventing financial fraud on big data. We were able detect these data engineering and data balancing bugs, as well as detect unnecessary computation in the pipelines

Universidade de Lisboa: Repositório.UL

How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review

Author: An Le
Antoniol Giulio
Khomh Foutse
Laberge Gabriel
Laviolette François
Merlo Ettore
Mindom Paulina Stevia Nouwou
Nikanjam Amin
Pequignot Yann
Tambon Florian
Publication venue
Publication date: 01/12/2021
Field of study

Context: Machine Learning (ML) has been at the heart of many innovations over the past years. However, including it in so-called 'safety-critical' systems such as automotive or aeronautic has proven to be very challenging, since the shift in paradigm that ML brings completely changes traditional certification approaches. Objective: This paper aims to elucidate challenges related to the certification of ML-based safety-critical systems, as well as the solutions that are proposed in the literature to tackle them, answering the question 'How to Certify Machine Learning Based Safety-critical Systems?'. Method: We conduct a Systematic Literature Review (SLR) of research papers published between 2015 to 2020, covering topics related to the certification of ML systems. In total, we identified 217 papers covering topics considered to be the main pillars of ML certification: Robustness, Uncertainty, Explainability, Verification, Safe Reinforcement Learning, and Direct Certification. We analyzed the main trends and problems of each sub-field and provided summaries of the papers extracted. Results: The SLR results highlighted the enthusiasm of the community for this subject, as well as the lack of diversity in terms of datasets and type of models. It also emphasized the need to further develop connections between academia and industries to deepen the domain study. Finally, it also illustrated the necessity to build connections between the above mention main pillars that are for now mainly studied separately. Conclusion: We highlighted current efforts deployed to enable the certification of ML based software systems, and discuss some future research directions.Comment: 60 pages (92 pages with references and complements), submitted to a journal (Automated Software Engineering). Changes: Emphasizing difference traditional software engineering / ML approach. Adding Related Works, Threats to Validity and Complementary Materials. Adding a table listing papers reference for each section/subsection

arXiv.org e-Print Archive

PolyPublie