22 research outputs found
Recurrent Neural Network Training with Dark Knowledge Transfer
Recurrent neural networks (RNNs), particularly long short-term memory (LSTM),
have gained much attention in automatic speech recognition (ASR). Although some
successful stories have been reported, training RNNs remains highly
challenging, especially with limited training data. Recent research found that
a well-trained model can be used as a teacher to train other child models, by
using the predictions generated by the teacher model as supervision. This
knowledge transfer learning has been employed to train simple neural nets with
a complex one, so that the final performance can reach a level that is
infeasible to obtain by regular training. In this paper, we employ the
knowledge transfer learning approach to train RNNs (precisely LSTM) using a
deep neural network (DNN) model as the teacher. This is different from most of
the existing research on knowledge transfer learning, since the teacher (DNN)
is assumed to be weaker than the child (RNN); however, our experiments on an
ASR task showed that it works fairly well: without applying any tricks on the
learning scheme, this approach can train RNNs successfully even with limited
training data.Comment: ICASSP 201
Distilling Word Embeddings: An Encoding Approach
Distilling knowledge from a well-trained cumbersome network to a small one
has recently become a new research topic, as lightweight neural networks with
high performance are particularly in need in various resource-restricted
systems. This paper addresses the problem of distilling word embeddings for NLP
tasks. We propose an encoding approach to distill task-specific knowledge from
a set of high-dimensional embeddings, which can reduce model complexity by a
large margin as well as retain high accuracy, showing a good compromise between
efficiency and performance. Experiments in two tasks reveal the phenomenon that
distilling knowledge from cumbersome embeddings is better than directly
training neural networks with small embeddings.Comment: Accepted by CIKM-16 as a short paper, and by the Representation
Learning for Natural Language Processing (RL4NLP) Workshop @ACL-16 for
presentatio
Conditional Teacher-Student Learning
The teacher-student (T/S) learning has been shown to be effective for a
variety of problems such as domain adaptation and model compression. One
shortcoming of the T/S learning is that a teacher model, not always perfect,
sporadically produces wrong guidance in form of posterior probabilities that
misleads the student model towards a suboptimal performance. To overcome this
problem, we propose a conditional T/S learning scheme, in which a "smart"
student model selectively chooses to learn from either the teacher model or the
ground truth labels conditioned on whether the teacher can correctly predict
the ground truth. Unlike a naive linear combination of the two knowledge
sources, the conditional learning is exclusively engaged with the teacher model
when the teacher model's prediction is correct, and otherwise backs off to the
ground truth. Thus, the student model is able to learn effectively from the
teacher and even potentially surpass the teacher. We examine the proposed
learning scheme on two tasks: domain adaptation on CHiME-3 dataset and speaker
adaptation on Microsoft short message dictation dataset. The proposed method
achieves 9.8% and 12.8% relative word error rate reductions, respectively, over
T/S learning for environment adaptation and speaker-independent model for
speaker adaptation.Comment: 5 pages, 1 figure, ICASSP 201
Compressing Recurrent Neural Network with Tensor Train
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and
sequential tasks and achieve many state-of-the-art performance on various
complex problems. However, most of the state-of-the-art RNNs have millions of
parameters and require many computational resources for training and predicting
new data. This paper proposes an alternative RNN model to reduce the number of
parameters significantly by representing the weight parameters based on Tensor
Train (TT) format. In this paper, we implement the TT-format representation for
several RNN architectures such as simple RNN and Gated Recurrent Unit (GRU). We
compare and evaluate our proposed RNN model with uncompressed RNN model on
sequence classification and sequence prediction tasks. Our proposed RNNs with
TT-format are able to preserve the performance while reducing the number of RNN
parameters significantly up to 40 times smaller.Comment: Accepted at IJCNN 201
LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks
Recently, deep neural networks have achieved remarkable performance on the
task of object detection and recognition. The reason for this success is mainly
grounded in the availability of large scale, fully annotated datasets, but the
creation of such a dataset is a complicated and costly task. In this paper, we
propose a novel method for weakly supervised object detection that simplifies
the process of gathering data for training an object detector. We train an
ensemble of two models that work together in a student-teacher fashion. Our
student (localizer) is a model that learns to localize an object, the teacher
(assessor) assesses the quality of the localization and provides feedback to
the student. The student uses this feedback to learn how to localize objects
and is thus entirely supervised by the teacher, as we are using no labels for
training the localizer. In our experiments, we show that our model is very
robust to noise and reaches competitive performance compared to a
state-of-the-art fully supervised approach. We also show the simplicity of
creating a new dataset, based on a few videos (e.g. downloaded from YouTube)
and artificially generated data.Comment: To appear in AMV18. Code, datasets and models available at
https://github.com/Bartzi/loan
Bolt: Accelerated Data Mining with Fast Vector Compression
Vectors of data are at the heart of machine learning and data mining.
Recently, vector quantization methods have shown great promise in reducing both
the time and space costs of operating on vectors. We introduce a vector
quantization algorithm that can compress vectors over 12x faster than existing
techniques while also accelerating approximate vector operations such as
distance and dot product computations by up to 10x. Because it can encode over
2GB of vectors per second, it makes vector quantization cheap enough to employ
in many more circumstances. For example, using our technique to compute
approximate dot products in a nested loop can multiply matrices faster than a
state-of-the-art BLAS implementation, even when our algorithm must first
compress the matrices.
In addition to showing the above speedups, we demonstrate that our approach
can accelerate nearest neighbor search and maximum inner product search by over
100x compared to floating point operations and up to 10x compared to other
vector quantization methods. Our approximate Euclidean distance and dot product
computations are not only faster than those of related algorithms with slower
encodings, but also faster than Hamming distance computations, which have
direct hardware support on the tested platforms. We also assess the errors of
our algorithm's approximate distances and dot products, and find that it is
competitive with existing, slower vector quantization algorithms.Comment: Research track paper at KDD 201