Search CORE

191 research outputs found

A survey on modern trainable activation functions

Author: Apicella Andrea
Donnarumma Francesco
Isgrò Francesco
Prevete Roberto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

In neural networks literature, there is a strong interest in identifying and defining activation functions which can improve neural network performance. In recent years there has been a renovated interest of the scientific community in investigating activation functions which can be trained during the learning process, usually referred to as "trainable", "learnable" or "adaptable" activation functions. They appear to lead to better network performance. Diverse and heterogeneous models of trainable activation function have been proposed in the literature. In this paper, we present a survey of these models. Starting from a discussion on the use of the term "activation function" in literature, we propose a taxonomy of trainable activation functions, highlight common and distinctive proprieties of recent and past models, and discuss main advantages and limitations of this type of approach. We show that many of the proposed approaches are equivalent to adding neuron layers which use fixed (non-trainable) activation functions and some simple local rule that constraints the corresponding weight layers.Comment: Published in "Neural Networks" journal (Elsevier

arXiv.org e-Print Archive

Archivio della ricerca - Università degli studi di Napoli Federico II

Universal Approximation of Parametric Optimization via Neural Networks with Piecewise Linear Policy Approximation

Author: Bae Hyunglip
Kim Jang Ho
Kim Woo Chang
Publication venue
Publication date: 21/08/2023
Field of study

Parametric optimization solves a family of optimization problems as a function of parameters. It is a critical component in situations where optimal decision making is repeatedly performed for updated parameter values, but computation becomes challenging when complex problems need to be solved in real-time. Therefore, in this study, we present theoretical foundations on approximating optimal policy of parametric optimization problem through Neural Networks and derive conditions that allow the Universal Approximation Theorem to be applied to parametric optimization problems by constructing piecewise linear policy approximation explicitly. This study fills the gap on formally analyzing the constructed piecewise linear approximation in terms of feasibility and optimality and show that Neural Networks (with ReLU activations) can be valid approximator for this approximation in terms of generalization and approximation error. Furthermore, based on theoretical results, we propose a strategy to improve feasibility of approximated solution and discuss training with suboptimal solutions.Comment: 17 pages, 2 figures, preprint, under revie

arXiv.org e-Print Archive

Greedy Shallow Networks: An Approach for Constructing and Training Neural Networks

Author: Dereventsov Anton
Petrosyan Armenak
Webster Clayton
Publication venue
Publication date: 04/02/1998
Field of study

We present a greedy-based approach to construct an efficient single hidden layer neural network with the ReLU activation that approximates a target function. In our approach we obtain a shallow network by utilizing a greedy algorithm with the prescribed dictionary provided by the available training data and a set of possible inner weights. To facilitate the greedy selection process we employ an integral representation of the network, based on the ridgelet transform, that significantly reduces the cardinality of the dictionary and hence promotes feasibility of the greedy selection. Our approach allows for the construction of efficient architectures which can be treated either as improved initializations to be used in place of random-based alternatives, or as fully-trained networks in certain cases, thus potentially nullifying the need for backpropagation training. Numerical experiments demonstrate the tenability of the proposed concept and its advantages compared to the conventional techniques for selecting architectures and initializations for neural networks

arXiv.org e-Print Archive

University of Richmond