ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for
  Verification

Arnaboldi, Luca; Casadio, Marco; Daggitt, Matthew L.; Dinkar, Tanvi; Isac, Omri; Kienitz, Daniel; Komendantskaya, Ekaterina; Rieser, Verena

ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

Authors: Luca Arnaboldi
Marco Casadio
Matthew L. Daggitt
Tanvi Dinkar
Omri Isac
Daniel Kienitz
Ekaterina Komendantskaya
Verena Rieser
Publication date: 15 August 2023
Publisher

Abstract

Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community.Comment: To appear in proceedings of 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (Affiliated with CAV 2023

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.04003

Last time updated on 12/05/2023