Search CORE

1,835 research outputs found

Output Reachable Set Estimation and Verification for Multi-Layer Neural Networks

Author: Xiang Weiming
Tran Hoang-Dung
Johnson Taylor T.
Publication venue
Publication date: 31/08/2004
Field of study

In this paper, the output reachable estimation and safety verification problems for multi-layer perceptron neural networks are addressed. First, a conception called maximum sensitivity in introduced and, for a class of multi-layer perceptrons whose activation functions are monotonic functions, the maximum sensitivity can be computed via solving convex optimization problems. Then, using a simulation-based method, the output reachable set estimation problem for neural networks is formulated into a chain of optimization problems. Finally, an automated safety verification is developed based on the output reachable set estimation result. An application to the safety verification for a robotic arm model with two joints is presented to show the effectiveness of proposed approaches.Comment: 8 pages, 9 figures, to appear in TNNL

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

On the distance between two neural networks and the stability of learning

Author: Bernstein Jeremy
Liu Ming-Yu
Vahdat Arash
Yue Yisong
Publication venue
Publication date: 09/02/2020
Field of study

How far apart are two neural networks? This is a foundational question in their theory. We derive a simple and tractable bound that relates distance in function space to distance in parameter space for a broad class of nonlinear compositional functions. The bound distills a clear dependence on depth of the composition. The theory is of practical relevance since it establishes a trust region for first-order optimisation. In turn, this suggests an optimiser that we call Frobenius matched gradient descent---or Fromage. Fromage involves a principled form of gradient rescaling and enjoys guarantees on stability of both the spectra and Frobenius norms of the weights. We find that the new algorithm increases the depth at which a multilayer perceptron may be trained as compared to Adam and SGD and is competitive with Adam for training generative adversarial networks. We further verify that Fromage scales up to a language transformer with over 10⁸ parameters. Please find code & reproducibility instructions at: https://github.com/jxbz/fromag

Revealing quantum chaos with machine learning

Author: Fedorov A. K.
Karazeev A. A.
Kharkov Y. A.
Kiktenko E. O.
Sotskov V. E.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2020
Field of study

Understanding properties of quantum matter is an outstanding challenge in science. In this paper, we demonstrate how machine-learning methods can be successfully applied for the classification of various regimes in single-particle and many-body systems. We realize neural network algorithms that perform a classification between regular and chaotic behavior in quantum billiard models with remarkably high accuracy. We use the variational autoencoder for autosupervised classification of regular/chaotic wave functions, as well as demonstrating that variational autoencoders could be used as a tool for detection of anomalous quantum states, such as quantum scars. By taking this method further, we show that machine learning techniques allow us to pin down the transition from integrability to many-body quantum chaos in Heisenberg XXZ spin chains. For both cases, we confirm the existence of universal W shapes that characterize the transition. Our results pave the way for exploring the power of machine learning tools for revealing exotic phenomena in quantum many-body systems.Comment: 12 pages, 12 figure

arXiv.org e-Print Archive

NeuralSens: Sensitivity Analysis of Neural Networks

Author: Muñoz A.
Pizarroso J.
Portela J.
Publication venue
Publication date: 08/02/2021
Field of study

Neural networks are important tools for data-intensive analysis and are commonly applied to model non-linear relationships between dependent and independent variables. However, neural networks are usually seen as "black boxes" that offer minimal information about how the input variables are used to predict the response in a fitted model. This article describes the \pkg{NeuralSens} package that can be used to perform sensitivity analysis of neural networks using the partial derivatives method. Functions in the package can be used to obtain the sensitivities of the output with respect to the input variables, evaluate variable importance based on sensitivity measures and characterize relationships between input and output variables. Methods to calculate sensitivities are provided for objects from common neural network packages in \proglang{R}, including \pkg{neuralnet}, \pkg{nnet}, \pkg{RSNNS}, \pkg{h2o}, \pkg{neural}, \pkg{forecast} and \pkg{caret}. The article presents an overview of the techniques for obtaining information from neural network models, a theoretical foundation of how are calculated the partial derivatives of the output with respect to the inputs of a multi-layer perceptron model, a description of the package structure and functions, and applied examples to compare \pkg{NeuralSens} functions with analogous functions from other available \proglang{R} packages.Comment: 28 pages, 12 figures, submitted to Journal of Statistical Software (JSS) https://www.jstatsoft.org/inde

arXiv.org e-Print Archive

Journal of Statistical Software

Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization

Author: Arimura Hiroki
Ike Yuichi
Kanamori Kentaro
Kobayashi Ken
Takagi Takuya
Uemura Kento
Publication venue: 'Japanese Society for Artificial Intelligence'
Publication date: 14/03/2021
Field of study

Post-hoc explanation methods for machine learning models have been widely used to support decision-making. One of the popular methods is Counterfactual Explanation (CE), also known as Actionable Recourse, which provides a user with a perturbation vector of features that alters the prediction result. Given a perturbation vector, a user can interpret it as an "action" for obtaining one's desired decision result. In practice, however, showing only a perturbation vector is often insufficient for users to execute the action. The reason is that if there is an asymmetric interaction among features, such as causality, the total cost of the action is expected to depend on the order of changing features. Therefore, practical CE methods are required to provide an appropriate order of changing features in addition to a perturbation vector. For this purpose, we propose a new framework called Ordered Counterfactual Explanation (OrdCE). We introduce a new objective function that evaluates a pair of an action and an order based on feature interaction. To extract an optimal pair, we propose a mixed-integer linear optimization approach with our objective function. Numerical experiments on real datasets demonstrated the effectiveness of our OrdCE in comparison with unordered CE methods.Comment: 20 pages, 5 figures, to appear in the 35th AAAI Conference on Artificial Intelligence (AAAI 2021

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multilayer optical learning networks

Author: Psaltis Demetri
Wagner Kelvin
Publication venue: Optical Society of America
Publication date: 01/12/1987
Field of study

A new approach to learning in a multilayer optical neural network based on holographically interconnected nonlinear devices is presented. The proposed network can learn the interconnections that form a distributed representation of a desired pattern transformation operation. The interconnections are formed in an adaptive and self-aligning fashioias volume holographic gratings in photorefractive crystals. Parallel arrays of globally space-integrated inner products diffracted by the interconnecting hologram illuminate arrays of nonlinear Fabry-Perot etalons for fast thresholding of the transformed patterns. A phase conjugated reference wave interferes with a backward propagating error signal to form holographic interference patterns which are time integrated in the volume of a photorefractive crystal to modify slowly and learn the appropriate self-aligning interconnections. This multilayer system performs an approximate implementation of the backpropagation learning procedure in a massively parallel high-speed nonlinear optical network

Caltech Authors

ARCHITECTURE OPTIMIZATION, TRAINING CONVERGENCE AND NETWORK ESTIMATION ROBUSTNESS OF A FULLY CONNECTED RECURRENT NEURAL NETWORK

Author: Wang Xiaoyu
Publication venue: Clemson University Libraries
Publication date: 01/05/2010
Field of study

Recurrent neural networks (RNN) have been rapidly developed in recent years. Applications of RNN can be found in system identification, optimization, image processing, pattern reorganization, classification, clustering, memory association, etc. In this study, an optimized RNN is proposed to model nonlinear dynamical systems. A fully connected RNN is developed first which is modified from a fully forward connected neural network (FFCNN) by accommodating recurrent connections among its hidden neurons. In addition, a destructive structure optimization algorithm is applied and the extended Kalman filter (EKF) is adopted as a network\u27s training algorithm. These two algorithms can seamlessly work together to generate the optimized RNN. The enhancement of the modeling performance of the optimized network comes from three parts: 1) its prototype - the FFCNN has advantages over multilayer perceptron network (MLP), the most widely used network, in terms of modeling accuracy and generalization ability; 2) the recurrency in RNN network make it more capable of modeling non-linear dynamical systems; and 3) the structure optimization algorithm further improves RNN\u27s modeling performance in generalization ability and robustness. Performance studies of the proposed network are highlighted in training convergence and robustness. For the training convergence study, the Lyapunov method is used to adapt some training parameters to guarantee the training convergence, while the maximum likelihood method is used to estimate some other parameters to accelerate the training process. In addition, robustness analysis is conducted to develop a robustness measure considering uncertainties propagation through RNN via unscented transform. Two case studies, the modeling of a benchmark non-linear dynamical system and a tool wear progression in hard turning, are carried out to testify the development in this dissertation. The work detailed in this dissertation focuses on the creation of: (1) a new method to prove/guarantee the training convergence of RNN, and (2) a new method to quantify the robustness of RNN using uncertainty propagation analysis. With the proposed study, RNN and related algorithms are developed to model nonlinear dynamical system which can benefit modeling applications such as the condition monitoring studies in terms of robustness and accuracy in the future

Clemson University: TigerPrints