29 research outputs found
Big Neural Networks Waste Capacity
This article exposes the failure of some big neural networks to leverage
added capacity to reduce underfitting. Past research suggest diminishing
returns when increasing the size of neural networks. Our experiments on
ImageNet LSVRC-2010 show that this may be due to the fact there are highly
diminishing returns for capacity in terms of training error, leading to
underfitting. This suggests that the optimization method - first order gradient
descent - fails at this regime. Directly attacking this problem, either through
the optimization method or the choices of parametrization, may allow to improve
the generalization error on large datasets, for which a large capacity is
required
A Convolutional Encoder Model for Neural Machine Translation
The prevalent approach to neural machine translation relies on bi-directional
LSTMs to encode the source sentence. In this paper we present a faster and
simpler architecture based on a succession of convolutional layers. This allows
to encode the entire source sentence simultaneously compared to recurrent
networks for which computation is constrained by temporal dependencies. On
WMT'16 English-Romanian translation we achieve competitive accuracy to the
state-of-the-art and we outperform several recently published results on the
WMT'15 English-German task. Our models obtain almost the same accuracy as a
very deep LSTM setup on WMT'14 English-French translation. Our convolutional
encoder speeds up CPU decoding by more than two times at the same or higher
accuracy as a strong bi-directional LSTM baseline.Comment: 13 page
Zero-Shot Learning for Semantic Utterance Classification
We propose a novel zero-shot learning method for semantic utterance
classification (SUC). It learns a classifier for problems where
none of the semantic categories are present in the training set. The
framework uncovers the link between categories and utterances using a semantic
space. We show that this semantic space can be learned by deep neural networks
trained on large amounts of search engine query log data. More precisely, we
propose a novel method that can learn discriminative semantic features without
supervision. It uses the zero-shot learning framework to guide the learning of
the semantic features. We demonstrate the effectiveness of the zero-shot
semantic learning algorithm on the SUC dataset collected by (Tur, 2012).
Furthermore, we achieve state-of-the-art results by combining the semantic
features with a supervised method