Search CORE

135 research outputs found

When Deep Learning Meets Polyhedral Theory: A Survey

Author: Huchette Joey
Muñoz Gonzalo
Serra Thiago
Tsay Calvin
Publication venue
Publication date: 31/08/2023
Field of study

In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure \unicode{x2014}such as the typical fully-connected feedforward neural network\unicode{x2014} amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks

arXiv.org e-Print Archive

Invariance and Invertibility in Deep Neural Networks

Author: Zhang Han
Publication venue: VCU Scholars Compass
Publication date: 01/01/2020
Field of study

Machine learning is concerned with computer systems that learn from data instead of being explicitly programmed to solve a particular task. One of the main approaches behind recent advances in machine learning involves neural networks with a large number of layers, often referred to as deep learning. In this dissertation, we study how to equip deep neural networks with two useful properties: invariance and invertibility. The first part of our work is focused on constructing neural networks that are invariant to certain transformations in the input, that is, some outputs of the network stay the same even if the input is altered. Furthermore, we want the network to learn the appropriate invariance from training data, instead of being explicitly constructed to achieve invariance to a pre-defined transformation type. The second part of our work is centered on two recently proposed types of deep networks: neural ordinary differential equations and invertible residual networks. These networks are invertible, that is, we can reconstruct the input from the output. However, there are some classes of functions that these networks cannot approximate. We show how to modify these two architectures to provably equip them with the capacity to approximate any smooth invertible function

VCU Scholars Compass

Generalized non-autonomous Cohen-Grossberg neural network model

Author: Elmwafy Ahmed
Oliveira José J.
Silva César M.
Publication venue
Publication date: 04/09/2023
Field of study

In the present paper, we investigate both the global exponential stability and the existence of a periodic solution of a general differential equation with unbounded distributed delays. The main stability criterion depends on the dominance of the non-delay terms over the delay terms. The criterion for the existence of a periodic solution is obtained with the application of the coincide degree theorem. We use the main results to get criteria for the existence and global exponential stability of periodic solutions of a generalized higher-order periodic Cohen-Grossberg neural network model with discrete-time varying delays and infinite distributed delays. Additionally, we provide a comparison with the results in the literature and a numerical simulation to illustrate the effectiveness of some of our results.Comment: 30 page

arXiv.org e-Print Archive

Deep Learning for Stable Monotone Dynamical Systems

Author: Gao Qitong
Pajic Miroslav
Wang Yu
Publication venue
Publication date: 11/06/2020
Field of study

Monotone systems, originating from real-world (e.g., biological or chemical) applications, are a class of dynamical systems that preserves a partial order of system states over time. In this work, we introduce a feedforward neural networks (FNNs)-based method to learn the dynamics of unknown stable nonlinear monotone systems. We propose the use of nonnegative neural networks and batch normalization, which in general enables the FNNs to capture the monotonicity conditions without reducing the expressiveness. To concurrently ensure stability during training, we adopt an alternating learning method to simultaneously learn the system dynamics and corresponding Lyapunov function, while exploiting monotonicity of the system.~The combination of the monotonicity and stability constraints ensures that the learned dynamics preserves both properties, while significantly reducing learning errors. Finally, our techniques are evaluated on two complex biological and chemical systems

arXiv.org e-Print Archive

Recommended from our members

Uses of Complex Wavelets in Deep Convolutional Neural Networks

Author: Cotter Fergal
Publication venue: University of Cambridge
Publication date: 16/08/2019
Field of study

Image understanding has long been a goal for computer vision. It has proved to be an exceptionally difficult task due to the large amounts of variability that are inherent to objects in a scene. Recent advances in supervised learning methods, particularly convolutional neural networks (CNNs), have pushed forth the frontier of what we have been able to train computers to do. Despite their successes, the mechanics of how these networks are able to recognize objects are little understood, and the networks themselves are often very difficult and time-consuming to train. It is very important that we improve our current approaches in every way possible. A CNN is built from connecting many learned convolutional layers in series. These convolutional layers are fairly crude in terms of signal processing - they are arbitrary taps of a finite impulse response filter, learned through stochastic gradient descent from random initial conditions. We believe that if we reformulate the problem, we may achieve many insights and benefits in training CNNs. Noting that modern CNNs are mostly viewed from and analyzed in the spatial domain, this thesis aims to view the convolutional layers in the frequency domain (viewing things in the frequency domain has proved useful in the past for denoising, filter design, compression and many other tasks). In particular, we use complex wavelets (rather than the Fourier transform or the discrete wavelet transform) as basis functions to reformulate image understanding with deep networks. In this thesis, we explore the most popular and well-developed form of using complex wavelets in deep learning, the ScatterNet from Stephane Mallat. We explore its current limitations by building a DeScatterNet and found that while it has many nice properties, it may not be sensitive to the most appropriate shapes for understanding natural images. We then develop a locally invariant convolutional layer, a combination of a complex wavelet transform, a modulus operation, and a learned mixing. To do this, we derive backpropagation equations and allow gradients to flow back through the (previously fixed) ScatterNet front end. Connecting several such locally invariant layers allows us to build learnable ScatterNet, a more flexible and general form of the ScatterNet (while still maintaining its desired properties). We show that the learnable ScatterNet can provide significant improvements over the regular ScatterNet when being used as a front end for a learning system. Additionally, we show that the locally invariant convolutional layer can directly replace convolutional layers in a deep CNN (and not just at the front-end). The locally invariant convolutional layers naturally downsample the input (because of the complex modulus) while increasing the channel dimension (because of the multiple wavelet orientations used). This is an operation that often happens in a CNN by a combination of a pooling and convolutional layer. It was at these locations in a CNN where the learnable ScatterNet performed best, implying it may be useful as learnable pooling layer. Finally, we develop a system to learn complex weights that act directly on the wavelet coefficients of signals, in place of a convolutional layer. We call this layer the wavelet gain layer and show it can be used alongside convolutional layers. The network designer may then choose to learn in the pixel or wavelet domains. This layer shows a lot of promise and affords more control over what regions of the frequency space we want our layer to learn from. Our experiments show that it can improve on learning in the pixel domain for early layers of a CNN

Apollo (Cambridge)

Structure-preserving deep learning

Author: Celledoni E.
Ehrhardt M. J.
Etmann C.
McLachlan R. I.
Owren B.
Schonlieb C. B.
Sherry F.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 05/06/2020
Field of study

Over the past few years, deep learning has risen to the foreground as a topic of massive interest, mainly as a result of successes obtained in solving large-scale image processing tasks. There are multiple challenging mathematical problems involved in applying deep learning: most deep learning methods require the solution of hard optimisation problems, and a good understanding of the tradeoff between computational effort, amount of data and model complexity is required to successfully design a deep learning approach for a given problem. A large amount of progress made in deep learning has been based on heuristic explorations, but there is a growing effort to mathematically understand the structure in existing deep learning methods and to systematically design new deep learning methods to preserve certain types of structure in deep learning. In this article, we review a number of these directions: some deep neural networks can be understood as discretisations of dynamical systems, neural networks can be designed to have desirable properties such as invertibility or group equivariance, and new algorithmic frameworks based on conformal Hamiltonian systems and Riemannian manifolds to solve the optimisation problems have been proposed. We conclude our review of each of these topics by discussing some open problems that we consider to be interesting directions for future research

arXiv.org e-Print Archive

OPUS

Recent Advances and Applications of Fractional-Order Neural Networks

Author: Benjapolakul Watit
Bingi Kishore
M Sunder
Maiti Monalisa
R Abishek
Shaik Nagoor Basha
Publication venue: 'Faculty of Engineering, Chulalongkorn University'
Publication date: 31/07/2022
Field of study

This paper focuses on the growth, development, and future of various forms of fractional-order neural networks. Multiple advances in structure, learning algorithms, and methods have been critically investigated and summarized. This also includes the recent trends in the dynamics of various fractional-order neural networks. The multiple forms of fractional-order neural networks considered in this study are Hopfield, cellular, memristive, complex, and quaternion-valued based networks. Further, the application of fractional-order neural networks in various computational fields such as system identification, control, optimization, and stability have been critically analyzed and discussed

Engineering Journal (Faculty of Engineering, Chulalongkorn University, Bangkok)

Connecting mathematical models for image processing and neural networks

Author: Alt-Veit Tobias
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

This thesis deals with the connections between mathematical models for image processing and deep learning. While data-driven deep learning models such as neural networks are flexible and well performing, they are often used as a black box. This makes it hard to provide theoretical model guarantees and scientific insights. On the other hand, more traditional, model-driven approaches such as diffusion, wavelet shrinkage, and variational models offer a rich set of mathematical foundations. Our goal is to transfer these foundations to neural networks. To this end, we pursue three strategies. First, we design trainable variants of traditional models and reduce their parameter set after training to obtain transparent and adaptive models. Moreover, we investigate the architectural design of numerical solvers for partial differential equations and translate them into building blocks of popular neural network architectures. This yields criteria for stable networks and inspires novel design concepts. Lastly, we present novel hybrid models for inpainting that rely on our theoretical findings. These strategies provide three ways for combining the best of the two worlds of model- and data-driven approaches. Our work contributes to the overarching goal of closing the gap between these worlds that still exists in performance and understanding.Gegenstand dieser Arbeit sind die Zusammenhänge zwischen mathematischen Modellen zur Bildverarbeitung und Deep Learning. Während datengetriebene Modelle des Deep Learning wie z.B. neuronale Netze flexibel sind und gute Ergebnisse liefern, werden sie oft als Black Box eingesetzt. Das macht es schwierig, theoretische Modellgarantien zu liefern und wissenschaftliche Erkenntnisse zu gewinnen. Im Gegensatz dazu bieten traditionellere, modellgetriebene Ansätze wie Diffusion, Wavelet Shrinkage und Variationsansätze eine Fülle von mathematischen Grundlagen. Unser Ziel ist es, diese auf neuronale Netze zu übertragen. Zu diesem Zweck verfolgen wir drei Strategien. Zunächst entwerfen wir trainierbare Varianten von traditionellen Modellen und reduzieren ihren Parametersatz, um transparente und adaptive Modelle zu erhalten. Außerdem untersuchen wir die Architekturen von numerischen Lösern für partielle Differentialgleichungen und übersetzen sie in Bausteine von populären neuronalen Netzwerken. Daraus ergeben sich Kriterien für stabile Netzwerke und neue Designkonzepte. Schließlich präsentieren wir neuartige hybride Modelle für Inpainting, die auf unseren theoretischen Erkenntnissen beruhen. Diese Strategien bieten drei Möglichkeiten, das Beste aus den beiden Welten der modell- und datengetriebenen Ansätzen zu vereinen. Diese Arbeit liefert einen Beitrag zum übergeordneten Ziel, die Lücke zwischen den zwei Welten zu schließen, die noch in Bezug auf Leistung und Modellverständnis besteht.ERC Advanced Grant INCOVI

Universaar

Acronym