1,932 research outputs found
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Deep learning tools have gained tremendous attention in applied machine
learning. However such tools for regression and classification do not capture
model uncertainty. In comparison, Bayesian models offer a mathematically
grounded framework to reason about model uncertainty, but usually come with a
prohibitive computational cost. In this paper we develop a new theoretical
framework casting dropout training in deep neural networks (NNs) as approximate
Bayesian inference in deep Gaussian processes. A direct result of this theory
gives us tools to model uncertainty with dropout NNs -- extracting information
from existing models that has been thrown away so far. This mitigates the
problem of representing uncertainty in deep learning without sacrificing either
computational complexity or test accuracy. We perform an extensive study of the
properties of dropout's uncertainty. Various network architectures and
non-linearities are assessed on tasks of regression and classification, using
MNIST as an example. We show a considerable improvement in predictive
log-likelihood and RMSE compared to existing state-of-the-art methods, and
finish by using dropout's uncertainty in deep reinforcement learning.Comment: 12 pages, 6 figures; fixed a mistake with standard error and added a
new table with updated results (marked "Update [October 2016]"); Published in
ICML 201
Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning
We present an algorithm for rapidly learning controllers for robotics
systems. The algorithm follows the model-based reinforcement learning paradigm,
and improves upon existing algorithms; namely Probabilistic learning in Control
(PILCO) and a sample-based version of PILCO with neural network dynamics
(Deep-PILCO). We propose training a neural network dynamics model using
variational dropout with truncated Log-Normal noise. This allows us to obtain a
dynamics model with calibrated uncertainty, which can be used to simulate
controller executions via rollouts. We also describe set of techniques,
inspired by viewing PILCO as a recurrent neural network model, that are crucial
to improve the convergence of the method. We test our method on a variety of
benchmark tasks, demonstrating data-efficiency that is competitive with PILCO,
while being able to optimize complex neural network controllers. Finally, we
assess the performance of the algorithm for learning motor controllers for a
six legged autonomous underwater vehicle. This demonstrates the potential of
the algorithm for scaling up the dimensionality and dataset sizes, in more
complex control tasks.Comment: 8 pages, 7 figure
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
We present weight normalization: a reparameterization of the weight vectors
in a neural network that decouples the length of those weight vectors from
their direction. By reparameterizing the weights in this way we improve the
conditioning of the optimization problem and we speed up convergence of
stochastic gradient descent. Our reparameterization is inspired by batch
normalization but does not introduce any dependencies between the examples in a
minibatch. This means that our method can also be applied successfully to
recurrent models such as LSTMs and to noise-sensitive applications such as deep
reinforcement learning or generative models, for which batch normalization is
less well suited. Although our method is much simpler, it still provides much
of the speed-up of full batch normalization. In addition, the computational
overhead of our method is lower, permitting more optimization steps to be taken
in the same amount of time. We demonstrate the usefulness of our method on
applications in supervised image recognition, generative modelling, and deep
reinforcement learning
A Flexible Approach to Automated RNN Architecture Generation
The process of designing neural architectures requires expert knowledge and
extensive trial and error. While automated architecture search may simplify
these requirements, the recurrent neural network (RNN) architectures generated
by existing methods are limited in both flexibility and components. We propose
a domain-specific language (DSL) for use in automated architecture search which
can produce novel RNNs of arbitrary depth and width. The DSL is flexible enough
to define standard architectures such as the Gated Recurrent Unit and Long
Short Term Memory and allows the introduction of non-standard RNN components
such as trigonometric curves and layer normalization. Using two different
candidate generation techniques, random search with a ranking function and
reinforcement learning, we explore the novel architectures produced by the RNN
DSL for language modeling and machine translation domains. The resulting
architectures do not follow human intuition yet perform well on their targeted
tasks, suggesting the space of usable RNN architectures is far larger than
previously assumed
Input-to-Output Gate to Improve RNN Language Models
This paper proposes a reinforcing method that refines the output layers of
existing Recurrent Neural Network (RNN) language models. We refer to our
proposed method as Input-to-Output Gate (IOG). IOG has an extremely simple
structure, and thus, can be easily combined with any RNN language models. Our
experiments on the Penn Treebank and WikiText-2 datasets demonstrate that IOG
consistently boosts the performance of several different types of current
topline RNN language models.Comment: Accepted as a conference paper in IJCNLP 201
Large-Scale Visual Active Learning with Deep Probabilistic Ensembles
Annotating the right data for training deep neural networks is an important
challenge. Active learning using uncertainty estimates from Bayesian Neural
Networks (BNNs) could provide an effective solution to this. Despite being
theoretically principled, BNNs require approximations to be applied to
large-scale problems, where both performance and uncertainty estimation are
crucial. In this paper, we introduce Deep Probabilistic Ensembles (DPEs), a
scalable technique that uses a regularized ensemble to approximate a deep BNN.
We conduct a series of large-scale visual active learning experiments to
evaluate DPEs on classification with the CIFAR-10, CIFAR-100 and ImageNet
datasets, and semantic segmentation with the BDD100k dataset. Our models
require significantly less training data to achieve competitive performances,
and steadily improve upon strong active learning baselines as the annotation
budget is increased.Comment: arXiv admin note: text overlap with arXiv:1811.0264
Deep Reinforcement Learning with Pre-training for Time-efficient Training of Automatic Speech Recognition
Deep reinforcement learning (deep RL) is a combination of deep learning with
reinforcement learning principles to create efficient methods that can learn by
interacting with its environment. This has led to breakthroughs in many complex
tasks, such as playing the game "Go", that were previously difficult to solve.
However, deep RL requires significant training time making it difficult to use
in various real-life applications such as Human-Computer Interaction (HCI). In
this paper, we study pre-training in deep RL to reduce the training time and
improve the performance of Speech Recognition, a popular application of HCI. To
evaluate the performance improvement in training we use the publicly available
"Speech Command" dataset, which contains utterances of 30 command keywords
spoken by 2,618 speakers. Results show that pre-training with deep RL offers
faster convergence compared to non-pre-trained RL while achieving improved
speech recognition accuracy.Comment: arXiv admin note: substantial text overlap with arXiv:1910.1125
Multi-Hop Knowledge Graph Reasoning with Reward Shaping
Multi-hop reasoning is an effective approach for query answering (QA) over
incomplete knowledge graphs (KGs). The problem can be formulated in a
reinforcement learning (RL) setup, where a policy-based agent sequentially
extends its inference path until it reaches a target. However, in an incomplete
KG environment, the agent receives low-quality rewards corrupted by false
negatives in the training data, which harms generalization at test time.
Furthermore, since no golden action sequence is used for training, the agent
can be misled by spurious search trajectories that incidentally lead to the
correct answer. We propose two modeling advances to address both issues: (1) we
reduce the impact of false negative supervision by adopting a pretrained
one-hop embedding model to estimate the reward of unobserved facts; (2) we
counter the sensitivity to spurious paths of on-policy RL by forcing the agent
to explore a diverse set of paths using randomly generated edge masks. Our
approach significantly improves over existing path-based KGQA models on several
benchmark datasets and is comparable or better than embedding-based models.Comment: Accepted to EMNLP 2018, 12 page
Improving Robustness of Neural Dialog Systems in a Data-Efficient Way with Turn Dropout
Neural network-based dialog models often lack robustness to anomalous,
out-of-domain (OOD) user input which leads to unexpected dialog behavior and
thus considerably limits such models' usage in mission-critical production
environments. The problem is especially relevant in the setting of dialog
system bootstrapping with limited training data and no access to OOD examples.
In this paper, we explore the problem of robustness of such systems to
anomalous input and the associated to it trade-off in accuracies on seen and
unseen data. We present a new dataset for studying the robustness of dialog
systems to OOD input, which is bAbI Dialog Task 6 augmented with OOD content in
a controlled way. We then present turn dropout, a simple yet efficient negative
sampling-based technique for improving robustness of neural dialog models. We
demonstrate its effectiveness applied to Hybrid Code Network-family models
(HCNs) which reach state-of-the-art results on our OOD-augmented dataset as
well as the original one. Specifically, an HCN trained with turn dropout
achieves state-of-the-art performance of more than 75% per-utterance accuracy
on the augmented dataset's OOD turns and 74% F1-score as an OOD detector.
Furthermore, we introduce a Variational HCN enhanced with turn dropout which
achieves more than 56.5% accuracy on the original bAbI Task 6 dataset, thus
outperforming the initially reported HCN's result.Comment: NeurIPS 2018 workshop on Conversational A
Deep Residual Output Layers for Neural Language Generation
Many tasks, including language generation, benefit from learning the
structure of the output space, particularly when the space of output labels is
large and the data is sparse. State-of-the-art neural language models
indirectly capture the output space structure in their classifier weights since
they lack parameter sharing across output labels. Learning shared output label
mappings helps, but existing methods have limited expressivity and are prone to
overfitting. In this paper, we investigate the usefulness of more powerful
shared mappings for output labels, and propose a deep residual output mapping
with dropout between layers to better capture the structure of the output space
and avoid overfitting. Evaluations on three language generation tasks show that
our output label mapping can match or improve state-of-the-art recurrent and
self-attention architectures, and suggest that the classifier does not
necessarily need to be high-rank to better model natural language if it is
better at capturing the structure of the output space.Comment: To appear in ICML 201
- …