4 research outputs found
Inverse Abstraction of Neural Networks Using Symbolic Interpolation
Neural networks in real-world applications have to satisfy critical properties such as safety and reliability. The analysis of such properties typically requires extracting information through computing pre-images of the network transformations, but it is well-known that explicit computation of pre-images is intractable. We introduce new methods for computing compact symbolic abstractions of pre-images by computing their overapproximations and underapproximations through all layers. The abstraction of pre-images enables formal analysis and knowledge extraction without affecting standard learning algorithms. We use inverse abstractions to automatically extract simple control laws and compact representations for pre-images corresponding to unsafe outputs. We illustrate that the extracted abstractions are interpretable and can be used for analyzing complex properties
Inverse Abstraction of Neural Networks Using Symbolic Interpolation
Neural networks in real-world applications have to satisfy critical properties such as safety and reliability. The analysis of such properties typically requires extracting information through computing pre-images of the network transformations, but it is well-known that explicit computation of pre-images is intractable. We introduce new methods for computing compact symbolic abstractions of pre-images by computing their overapproximations and underapproximations through all layers. The abstraction of pre-images enables formal analysis and knowledge extraction without affecting standard learning algorithms. We use inverse abstractions to automatically extract simple control laws and compact representations for pre-images corresponding to unsafe outputs. We illustrate that the extracted abstractions are interpretable and can be used for analyzing complex properties
Scalable Inference of Symbolic Adversarial Examples
We present a novel method for generating symbolic adversarial examples: input
regions guaranteed to only contain adversarial examples for the given neural
network. These regions can generate real-world adversarial examples as they
summarize trillions of adversarial examples.
We theoretically show that computing optimal symbolic adversarial examples is
computationally expensive. We present a method for approximating optimal
examples in a scalable manner. Our method first selectively uses adversarial
attacks to generate a candidate region and then prunes this region with
hyperplanes that fit points obtained via specialized sampling. It iterates
until arriving at a symbolic adversarial example for which it can prove, via
state-of-the-art convex relaxation techniques, that the region only contains
adversarial examples. Our experimental results demonstrate that our method is
practically effective: it only needs a few thousand attacks to infer symbolic
summaries guaranteed to contain adversarial examples