648 research outputs found
A Neural Network model with Bidirectional Whitening
We present here a new model and algorithm which performs an efficient Natural
gradient descent for Multilayer Perceptrons. Natural gradient descent was
originally proposed from a point of view of information geometry, and it
performs the steepest descent updates on manifolds in a Riemannian space. In
particular, we extend an approach taken by the "Whitened neural networks"
model. We make the whitening process not only in feed-forward direction as in
the original model, but also in the back-propagation phase. Its efficacy is
shown by an application of this "Bidirectional whitened neural networks" model
to a handwritten character recognition data (MNIST data).Comment: 16page
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better
variant RoBERTa, has been a common practice for advancing performance in
natural language understanding (NLU) tasks. Recent advance in representation
learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings
can significantly improve performance on downstream tasks with faster
convergence and better generalization. The isotropy of the pre-trained
embeddings in PTLMs, however, is relatively under-explored. In this paper, we
analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with
straightforward visualization, and point out two major issues: high variance in
their standard deviation, and high correlation between different dimensions. We
also propose a new network regularization method, isotropic batch normalization
(IsoBN) to address the issues, towards learning more isotropic representations
in fine-tuning by dynamically penalizing dominating principal components. This
simple yet effective fine-tuning method yields about 1.0 absolute increment on
the average of seven NLU tasks.Comment: AAAI 202
High-dimensional sequence transduction
We investigate the problem of transforming an input sequence into a
high-dimensional output sequence in order to transcribe polyphonic audio music
into symbolic notation. We introduce a probabilistic model based on a recurrent
neural network that is able to learn realistic output distributions given the
input and we devise an efficient algorithm to search for the global mode of
that distribution. The resulting method produces musically plausible
transcriptions even under high levels of noise and drastically outperforms
previous state-of-the-art approaches on five datasets of synthesized sounds and
real recordings, approximately halving the test error rate
- …