Verification of machine learning models used in Natural Language Processing
(NLP) is known to be a hard problem. In particular, many known neural network
verification methods that work for computer vision and other numeric datasets
do not work for NLP. Here, we study technical reasons that underlie this
problem. Based on this analysis, we propose practical methods and heuristics
for preparing NLP datasets and models in a way that renders them amenable to
known verification methods based on abstract interpretation. We implement these
methods as a Python library called ANTONIO that links to the neural network
verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP
dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP
applications. We hope that, thanks to its general applicability, this work will
open novel possibilities for including NLP verification problems into neural
network verification competitions, and will popularise NLP problems within this
community.Comment: To appear in proceedings of 6th Workshop on Formal Methods for
ML-Enabled Autonomous Systems (Affiliated with CAV 2023