33,203 research outputs found
Gated Linear Networks
This paper presents a new family of backpropagation-free neural
architectures, Gated Linear Networks (GLNs). What distinguishes GLNs from
contemporary neural networks is the distributed and local nature of their
credit assignment mechanism; each neuron directly predicts the target, forgoing
the ability to learn feature representations in favor of rapid online learning.
Individual neurons can model nonlinear functions via the use of data-dependent
gating in conjunction with online convex optimization. We show that this
architecture gives rise to universal learning capabilities in the limit, with
effective model capacity increasing as a function of network size in a manner
comparable with deep ReLU networks. Furthermore, we demonstrate that the GLN
learning mechanism possesses extraordinary resilience to catastrophic
forgetting, performing comparably to a MLP with dropout and Elastic Weight
Consolidation on standard benchmarks. These desirable theoretical and empirical
properties position GLNs as a complementary technique to contemporary offline
deep learning methods.Comment: arXiv admin note: substantial text overlap with arXiv:1712.0189
Extending Gated Linear Networks for Continual Learning
To incrementally learn multiple tasks from an indefinitely long stream of data
is a real challenge for traditional machine learning models. If not carefully
controlled, the learning of new knowledge strongly impacts on a model’s learned
abilities, making it to forget how to solve past tasks.
Continual learning faces this problem, called catastrophic forgetting, developing
models able to continually learn new tasks and adapt to changes in the
data distribution.
In this dissertation, we consider the recently proposed family of continual
learning models, called Gated Linear Networks (GLNs), and study two crucial
aspects impacting on the amount of catastrophic forgetting affecting gated linear
networks, namely, data standardization and gating mechanism.
Data standardization is particularly challenging in the online/continual learning
setting because data from future tasks is not available beforehand. The
results obtained using an online standardization method show a considerably
higher amount of forgetting compared to an offline –static– standardization.
Interestingly, with the latter standardization, we observe that GLNs show almost
no forgetting on the considered benchmark datasets.
Secondly, for an effective GLNs, it is essential to tailor the hyperparameters
of the gating mechanism to the data distribution. In this dissertation, we propose
a gating strategy based on a set of prototypes and the resulting Voronoi
tessellation. The experimental assessment shows that, in an ideal setting where
the data distribution is known, the proposed approach is more robust to different
data standardizations compared to the original one, based on a halfspace
gating mechanism, and shows improved predictive performance.
Finally, we propose an adaptive mechanism for the choice of prototypes,
which expands and shrinks the set of prototypes in an online fashion, making the
model suitable for practical continual learning applications. The experimental
results show that the adaptive model performances are close to the ideal scenario
where prototypes are directly sampled from the data distribution.To incrementally learn multiple tasks from an indefinitely long stream of data
is a real challenge for traditional machine learning models. If not carefully
controlled, the learning of new knowledge strongly impacts on a model’s learned
abilities, making it to forget how to solve past tasks.
Continual learning faces this problem, called catastrophic forgetting, developing
models able to continually learn new tasks and adapt to changes in the
data distribution.
In this dissertation, we consider the recently proposed family of continual
learning models, called Gated Linear Networks (GLNs), and study two crucial
aspects impacting on the amount of catastrophic forgetting affecting gated linear
networks, namely, data standardization and gating mechanism.
Data standardization is particularly challenging in the online/continual learning
setting because data from future tasks is not available beforehand. The
results obtained using an online standardization method show a considerably
higher amount of forgetting compared to an offline –static– standardization.
Interestingly, with the latter standardization, we observe that GLNs show almost
no forgetting on the considered benchmark datasets.
Secondly, for an effective GLNs, it is essential to tailor the hyperparameters
of the gating mechanism to the data distribution. In this dissertation, we propose
a gating strategy based on a set of prototypes and the resulting Voronoi
tessellation. The experimental assessment shows that, in an ideal setting where
the data distribution is known, the proposed approach is more robust to different
data standardizations compared to the original one, based on a halfspace
gating mechanism, and shows improved predictive performance.
Finally, we propose an adaptive mechanism for the choice of prototypes,
which expands and shrinks the set of prototypes in an online fashion, making the
model suitable for practical continual learning applications. The experimental
results show that the adaptive model performances are close to the ideal scenario
where prototypes are directly sampled from the data distribution
Compressing Recurrent Neural Network with Tensor Train
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and
sequential tasks and achieve many state-of-the-art performance on various
complex problems. However, most of the state-of-the-art RNNs have millions of
parameters and require many computational resources for training and predicting
new data. This paper proposes an alternative RNN model to reduce the number of
parameters significantly by representing the weight parameters based on Tensor
Train (TT) format. In this paper, we implement the TT-format representation for
several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We
compare and evaluate our proposed RNN model with uncompressed RNN model on
sequence classification and sequence prediction tasks. Our proposed RNNs with
TT-format are able to preserve the performance while reducing the number of RNN
parameters significantly up to 40 times smaller.Comment: Accepted at IJCNN 201
Improving speech recognition by revising gated recurrent units
Speech recognition is largely taking advantage of deep learning, showing that
substantial benefits can be obtained by modern Recurrent Neural Networks
(RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which
typically reach state-of-the-art performance in many tasks thanks to their
ability to learn long-term dependencies and robustness to vanishing gradients.
Nevertheless, LSTMs have a rather complex design with three multiplicative
gates, that might impair their efficient implementation. An attempt to simplify
LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just
two multiplicative gates.
This paper builds on these efforts by further revising GRUs and proposing a
simplified architecture potentially more suitable for speech recognition. The
contribution of this work is two-fold. First, we suggest to remove the reset
gate in the GRU design, resulting in a more efficient single-gate architecture.
Second, we propose to replace tanh with ReLU activations in the state update
equations. Results show that, in our implementation, the revised architecture
reduces the per-epoch training time with more than 30% and consistently
improves recognition performance across different tasks, input features, and
noisy conditions when compared to a standard GRU
Deformable Object Tracking with Gated Fusion
The tracking-by-detection framework receives growing attentions through the
integration with the Convolutional Neural Networks (CNNs). Existing
tracking-by-detection based methods, however, fail to track objects with severe
appearance variations. This is because the traditional convolutional operation
is performed on fixed grids, and thus may not be able to find the correct
response while the object is changing pose or under varying environmental
conditions. In this paper, we propose a deformable convolution layer to enrich
the target appearance representations in the tracking-by-detection framework.
We aim to capture the target appearance variations via deformable convolution,
which adaptively enhances its original features. In addition, we also propose a
gated fusion scheme to control how the variations captured by the deformable
convolution affect the original appearance. The enriched feature representation
through deformable convolution facilitates the discrimination of the CNN
classifier on the target object and background. Extensive experiments on the
standard benchmarks show that the proposed tracker performs favorably against
state-of-the-art methods
- …