2,719 research outputs found
Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks
In this paper, we introduce a novel type of Rectified Linear Unit (ReLU),
called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an
unbounded positive and negative image, can be used as a drop-in replacement for
a tanh activation function in the recurrent step of Quasi-Recurrent Neural
Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less
prone to the vanishing gradient problem, they are noise robust, and they induce
sparse activations.
We independently reproduce the QRNN experiments of Bradbury et al. (2017) and
compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long
Short-Term Memory networks (LSTMs) on sentiment classification and word-level
language modeling. Additionally, we evaluate on character-level language
modeling, showing that we are able to stack up to eight QRNN layers with
DReLUs, thus making it possible to improve the current state-of-the-art in
character-level language modeling over shallow architectures based on LSTMs
Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform
Recurrent neural networks (RNNs) have been successfully used on a wide range
of sequential data problems. A well known difficulty in using RNNs is the
\textit{vanishing or exploding gradient} problem. Recently, there have been
several different RNN architectures that try to mitigate this issue by
maintaining an orthogonal or unitary recurrent weight matrix. One such
architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN)
which parameterizes the orthogonal recurrent weight matrix through a scaled
Cayley transform. This parametrization contains a diagonal scaling matrix
consisting of positive or negative one entries that can not be optimized by
gradient descent. Thus the scaling matrix is fixed before training and a
hyperparameter is introduced to tune the matrix for each particular task. In
this paper, we develop a unitary RNN architecture based on a complex scaled
Cayley transform. Unlike the real orthogonal case, the transformation uses a
diagonal scaling matrix consisting of entries on the complex unit circle which
can be optimized using gradient descent and no longer requires the tuning of a
hyperparameter. We also provide an analysis of a potential issue of the modReLU
activiation function which is used in our work and several other unitary RNNs.
In the experiments conducted, the scaled Cayley unitary recurrent neural
network (scuRNN) achieves comparable or better results than scoRNN and other
unitary RNNs without fixing the scaling matrix
Deep Networks for Image Super-Resolution with Sparse Prior
Deep learning techniques have been successfully applied in many areas of
computer vision, including low-level image restoration problems. For image
super-resolution, several models based on deep neural networks have been
recently proposed and attained superior performance that overshadows all
previous handcrafted models. The question then arises whether large-capacity
and data-driven models have become the dominant solution to the ill-posed
super-resolution problem. In this paper, we argue that domain expertise
represented by the conventional sparse coding model is still valuable, and it
can be combined with the key ingredients of deep learning to achieve further
improved results. We show that a sparse coding model particularly designed for
super-resolution can be incarnated as a neural network, and trained in a
cascaded structure from end to end. The interpretation of the network based on
sparse coding leads to much more efficient and effective training, as well as a
reduced model size. Our model is evaluated on a wide range of images, and shows
clear advantage over existing state-of-the-art methods in terms of both
restoration accuracy and human subjective quality
- …