48,070 research outputs found
FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks
Rectified linear unit (ReLU) is a widely used activation function for deep
convolutional neural networks. However, because of the zero-hard rectification,
ReLU networks miss the benefits from negative values. In this paper, we propose
a novel activation function called \emph{flexible rectified linear unit
(FReLU)} to further explore the effects of negative values. By redesigning the
rectified point of ReLU as a learnable parameter, FReLU expands the states of
the activation output. When the network is successfully trained, FReLU tends to
converge to a negative value, which improves the expressiveness and thus the
performance. Furthermore, FReLU is designed to be simple and effective without
exponential functions to maintain low cost computation. For being able to
easily used in various network architectures, FReLU does not rely on strict
assumptions by self-adaption. We evaluate FReLU on three standard image
classification datasets, including CIFAR-10, CIFAR-100, and ImageNet.
Experimental results show that the proposed method achieves fast convergence
and higher performances on both plain and residual networks
Invariance of Weight Distributions in Rectified MLPs
An interesting approach to analyzing neural networks that has received
renewed attention is to examine the equivalent kernel of the neural network.
This is based on the fact that a fully connected feedforward network with one
hidden layer, a certain weight distribution, an activation function, and an
infinite number of neurons can be viewed as a mapping into a Hilbert space. We
derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for
all rotationally-invariant weight distributions, generalizing a previous result
that required Gaussian weight distributions. Additionally, the Central Limit
Theorem is used to show that for certain activation functions, kernels
corresponding to layers with weight distributions having mean and finite
absolute third moment are asymptotically universal, and are well approximated
by the kernel corresponding to layers with spherical Gaussian weights. In deep
networks, as depth increases the equivalent kernel approaches a pathological
fixed point, which can be used to argue why training randomly initialized
networks can be difficult. Our results also have implications for weight
initialization.Comment: ICML 201
Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians
Convolutional neural nets (CNNs) have demonstrated remarkable performance in
recent history. Such approaches tend to work in a unidirectional bottom-up
feed-forward fashion. However, practical experience and biological evidence
tells us that feedback plays a crucial role, particularly for detailed spatial
understanding tasks. This work explores bidirectional architectures that also
reason with top-down feedback: neural units are influenced by both lower and
higher-level units.
We do so by treating units as rectified latent variables in a quadratic
energy function, which can be seen as a hierarchical Rectified Gaussian model
(RGs). We show that RGs can be optimized with a quadratic program (QP), that
can in turn be optimized with a recurrent neural network (with rectified linear
units). This allows RGs to be trained with GPU-optimized gradient descent. From
a theoretical perspective, RGs help establish a connection between CNNs and
hierarchical probabilistic models. From a practical perspective, RGs are well
suited for detailed spatial tasks that can benefit from top-down reasoning. We
illustrate them on the challenging task of keypoint localization under
occlusions, where local bottom-up evidence may be misleading. We demonstrate
state-of-the-art results on challenging benchmarks.Comment: To appear in CVPR 201
- …