    Maximum Resilience of Artificial Neural Networks

    The deployment of Artificial Neural Networks (ANNs) in safety-critical applications poses a number of new verification and certification challenges. In particular, for ANN-enabled self-driving vehicles it is important to establish properties about the resilience of ANNs to noisy or even maliciously manipulated sensory input. We are addressing these challenges by defining resilience properties of ANN-based classifiers as the maximal amount of input or sensor perturbation which is still tolerated. This problem of computing maximal perturbation bounds for ANNs is then reduced to solving mixed integer optimization problems (MIP). A number of MIP encoding heuristics are developed for drastically reducing MIP-solver runtimes, and using parallelization of MIP-solvers results in an almost linear speed-up in the number (up to a certain limit) of computing cores in our experiments. We demonstrate the effectiveness and scalability of our approach by means of computing maximal resilience bounds for a number of ANN benchmark sets ranging from typical image recognition scenarios to the autonomous maneuvering of robots.Comment: Timestamp research work conducted in the project. version 2: fix some typos, rephrase the definition, and add some more existing wor

    Towards Fast Computation of Certified Robustness for ReLU Networks

    Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low quality bounds that are too loose to be useful. In this paper, we exploit the special structure of ReLU networks and provide two computationally efficient algorithms Fast-Lin and Fast-Lip that are able to certify non-trivial lower bounds of minimum distortions, by bounding the ReLU units with appropriate linear functions Fast-Lin, or by bounding the local Lipschitz constant Fast-Lip. Experiments show that (1) our proposed methods deliver bounds close to (the gap is 2-3X) exact minimum distortion found by Reluplex in small MNIST networks while our algorithms are more than 10,000 times faster; (2) our methods deliver similar quality of bounds (the gap is within 35% and usually around 10%; sometimes our bounds are even better) for larger networks compared to the methods based on solving linear programming problems but our algorithms are 33-14,000 times faster; (3) our method is capable of solving large MNIST and CIFAR networks up to 7 layers with more than 10,000 neurons within tens of seconds on a single CPU core. In addition, we show that, in fact, there is no polynomial time algorithm that can approximately find the minimum 1\ell_1 adversarial distortion of a ReLU network with a 0.99lnn0.99\ln n approximation ratio unless NP\mathsf{NP}=P\mathsf{P}, where nn is the number of neurons in the network.Comment: Tsui-Wei Weng and Huan Zhang contributed equall

    Mathematical optimization in deep learning

    Mathematical Optimization plays a pillar role in Machine Learning (ML) and Neural Networks (NN) are amongst the most popular and effective ML architectures and are the subject of a very intense investigation. They have also been proven immensely powerful at solving prediction tasks in areas such as speech recognition, image classification, robotics and quantum physics. In this work we present the problem of training a Deep Neural Network (DNN), specifically the continuous optimization problem arising in Feed-Forward Networks with Rectified Linear Unit (ReLU) activation. Then we will discuss the inverse problem, presenting a model for a trained DNN as a 0-1 Mixed Integer Linear Program (MILP). Some applications, such as feature visualization and the construction of adversarial examples will be outlined. Computational experiments are reported for both direct and inverse problem. The remainder of the text contains the AMPL codes used for solving the posed problems.La optimización matemática juega un papel fundamental en el aprendizaje automático (AA), y las redes neuronales (NN) se encuentran entre las estructuras más populares y efectivas dentro de este campo. Por ello, son objecto de una intensa investigación. Además, han demostrado ser inmensamente potentes resolviendo tareas de predicción en áreas como reconocimiento automático del habla, clasificación de imágenes, robótica y física cuántica. En este trabajo, se presenta el problema de entrenar una red neuronal profunda (DNN), específicamente el problema de optimización continua que surge en las redes neuronales prealimentadas (FNN) con rectificador (ReLU) como función de activación. Posteriormente, se discutirá el problema inverso, presentaremos un modelo para una DNN que ya ha sido entrenada como un problema de programación lineal en enteros mixta. Describiremos algunas aplicaciones, como visualización de características y la construcción de ejemplos maliciosos. Se realizarán los experimentos computacionales para ambos problemas, el directo y el inverso. Los códigos de AMPL para los problemas planteados se encuentran al final del documento.Universidad de Sevilla. Doble Grado en Física y Matemática

    Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

    Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural networ

    On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks

    Large-scale deep neural networks are both memory intensive and computation-intensive, thereby posing stringent requirements on the computing platforms. Hardware accelerations of deep neural networks have been extensively investigated in both industry and academia. Specific forms of binary neural networks (BNNs) and stochastic computing based neural networks (SCNNs) are particularly appealing to hardware implementations since they can be implemented almost entirely with binary operations. Despite the obvious advantages in hardware implementation, these approximate computing techniques are questioned by researchers in terms of accuracy and universal applicability. Also it is important to understand the relative pros and cons of SCNNs and BNNs in theory and in actual hardware implementations. In order to address these concerns, in this paper we prove that the "ideal" SCNNs and BNNs satisfy the universal approximation property with probability 1 (due to the stochastic behavior). The proof is conducted by first proving the property for SCNNs from the strong law of large numbers, and then using SCNNs as a "bridge" to prove for BNNs. Based on the universal approximation property, we further prove that SCNNs and BNNs exhibit the same energy complexity. In other words, they have the same asymptotic energy consumption with the growing of network size. We also provide a detailed analysis of the pros and cons of SCNNs and BNNs for hardware implementations and conclude that SCNNs are more suitable for hardware.Comment: 9 pages, 3 figure