102 research outputs found
Do Deep Nets Really Need to be Deep?
Currently, deep neural networks are the state of the art on problems such as
speech recognition and computer vision. In this extended abstract, we show that
shallow feed-forward networks can learn the complex functions previously
learned by deep nets and achieve accuracies previously only achievable with
deep models. Moreover, in some cases the shallow neural nets can learn these
deep functions using a total number of parameters similar to the original deep
model. We evaluate our method on the TIMIT phoneme recognition task and are
able to train shallow fully-connected nets that perform similarly to complex,
well-engineered, deep convolutional architectures. Our success in training
shallow neural nets to mimic deeper models suggests that there probably exist
better algorithms for training shallow feed-forward nets than those currently
available.Comment: final revision coming soo
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
The ability to act in multiple environments and transfer previous knowledge
to new situations can be considered a critical aspect of any intelligent agent.
Towards this goal, we define a novel method of multitask and transfer learning
that enables an autonomous agent to learn how to behave in multiple tasks
simultaneously, and then generalize its knowledge to new domains. This method,
termed "Actor-Mimic", exploits the use of deep reinforcement learning and model
compression techniques to train a single policy network that learns how to act
in a set of distinct tasks by using the guidance of several expert teachers. We
then show that the representations learnt by the deep policy network are
capable of generalizing to new tasks with no prior expert guidance, speeding up
learning in novel environments. Although our method can in general be applied
to a wide range of problems, we use Atari games as a testing environment to
demonstrate these methods.Comment: Accepted as a conference paper at ICLR 201
Learning Wake-Sleep Recurrent Attention Models
Despite their success, convolutional neural networks are computationally
expensive because they must examine all image locations. Stochastic
attention-based models have been shown to improve computational efficiency at
test time, but they remain difficult to train because of intractable posterior
inference and high variance in the stochastic gradient estimates. Borrowing
techniques from the literature on training deep generative models, we present
the Wake-Sleep Recurrent Attention Model, a method for training stochastic
attention networks which improves posterior inference and which reduces the
variability in the stochastic gradients. We show that our method can greatly
speed up the training time for stochastic attention networks in the domains of
image classification and caption generation.Comment: To appear in NIPS 201
Reversible Recurrent Neural Networks
Recurrent neural networks (RNNs) provide state-of-the-art performance in
processing sequential data but are memory intensive to train, limiting the
flexibility of RNN models which can be trained. Reversible RNNs---RNNs for
which the hidden-to-hidden transition can be reversed---offer a path to reduce
the memory requirements of training, as hidden states need not be stored and
instead can be recomputed during backpropagation. We first show that perfectly
reversible RNNs, which require no storage of the hidden activations, are
fundamentally limited because they cannot forget information from their hidden
state. We then provide a scheme for storing a small number of bits in order to
allow perfect reversal with forgetting. Our method achieves comparable
performance to traditional models while reducing the activation memory cost by
a factor of 10--15. We extend our technique to attention-based
sequence-to-sequence models, where it maintains performance while reducing
activation memory cost by a factor of 5--10 in the encoder, and a factor of
10--15 in the decoder.Comment: Published as a conference paper at NIPS 201
Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning
Convolutional neural networks (CNN) have achieved state of the art
performance on both classification and segmentation tasks. Applying CNNs to
microscopy images is challenging due to the lack of datasets labeled at the
single cell level. We extend the application of CNNs to microscopy image
classification and segmentation using multiple instance learning (MIL). We
present the adaptive Noisy-AND MIL pooling function, a new MIL operator that is
robust to outliers. Combining CNNs with MIL enables training CNNs using full
resolution microscopy images with global labels. We base our approach on the
similarity between the aggregation function used in MIL and pooling layers used
in CNNs. We show that training MIL CNNs end-to-end outperforms several previous
methods on both mammalian and yeast microscopy images without requiring any
segmentation steps
Generating Images from Captions with Attention
Motivated by the recent progress in generative models, we introduce a model
that generates images from natural language descriptions. The proposed model
iteratively draws patches on a canvas, while attending to the relevant words in
the description. After training on Microsoft COCO, we compare our model with
several baseline generative models on image generation and retrieval tasks. We
demonstrate that our model produces higher quality samples than other
approaches and generates images with novel scene compositions corresponding to
previously unseen captions in the dataset.Comment: Published as a conference paper at ICLR 201
Layer Normalization
Training state-of-the-art, deep neural networks is computationally expensive.
One way to reduce the training time is to normalize the activities of the
neurons. A recently introduced technique called batch normalization uses the
distribution of the summed input to a neuron over a mini-batch of training
cases to compute a mean and variance which are then used to normalize the
summed input to that neuron on each training case. This significantly reduces
the training time in feed-forward neural networks. However, the effect of batch
normalization is dependent on the mini-batch size and it is not obvious how to
apply it to recurrent neural networks. In this paper, we transpose batch
normalization into layer normalization by computing the mean and variance used
for normalization from all of the summed inputs to the neurons in a layer on a
single training case. Like batch normalization, we also give each neuron its
own adaptive bias and gain which are applied after the normalization but before
the non-linearity. Unlike batch normalization, layer normalization performs
exactly the same computation at training and test times. It is also
straightforward to apply to recurrent neural networks by computing the
normalization statistics separately at each time step. Layer normalization is
very effective at stabilizing the hidden state dynamics in recurrent networks.
Empirically, we show that layer normalization can substantially reduce the
training time compared with previously published techniques
ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning
Sparse reward is one of the most challenging problems in reinforcement
learning (RL). Hindsight Experience Replay (HER) attempts to address this issue
by converting a failed experience to a successful one by relabeling the goals.
Despite its effectiveness, HER has limited applicability because it lacks a
compact and universal goal representation. We present Augmenting experienCe via
TeacheR's adviCE (ACTRCE), an efficient reinforcement learning technique that
extends the HER framework using natural language as the goal representation. We
first analyze the differences among goal representation, and show that ACTRCE
can efficiently solve difficult reinforcement learning problems in challenging
3D navigation tasks, whereas HER with non-language goal representation failed
to learn. We also show that with language goal representations, the agent can
generalize to unseen instructions, and even generalize to instructions with
unseen lexicons. We further demonstrate it is crucial to use hindsight advice
to solve challenging tasks, and even small amount of advice is sufficient for
the agent to achieve good performance
Exploring Model-based Planning with Policy Networks
Model-based reinforcement learning (MBRL) with model-predictive control or
online planning has shown great potential for locomotion control tasks in terms
of both sample efficiency and asymptotic performance. Despite their initial
successes, the existing planning methods search from candidate sequences
randomly generated in the action space, which is inefficient in complex
high-dimensional environments. In this paper, we propose a novel MBRL
algorithm, model-based policy planning (POPLIN), that combines policy networks
with online planning. More specifically, we formulate action planning at each
time-step as an optimization problem using neural networks. We experiment with
both optimization w.r.t. the action sequences initialized from the policy
network, and also online optimization directly w.r.t. the parameters of the
policy network. We show that POPLIN obtains state-of-the-art performance in the
MuJoCo benchmarking environments, being about 3x more sample efficient than the
state-of-the-art algorithms, such as PETS, TD3 and SAC. To explain the
effectiveness of our algorithm, we show that the optimization surface in
parameter space is smoother than in action space. Further more, we found the
distilled policy network can be effectively applied without the expansive model
predictive control during test time for some environments such as Cheetah. Code
is released in https://github.com/WilsonWangTHU/POPLIN.Comment: 8 pages, 7 figure
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
In this work, we propose to apply trust region optimization to deep
reinforcement learning using a recently proposed Kronecker-factored
approximation to the curvature. We extend the framework of natural policy
gradient and propose to optimize both the actor and the critic using
Kronecker-factored approximate curvature (K-FAC) with trust region; hence we
call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To
the best of our knowledge, this is the first scalable trust region natural
gradient method for actor-critic methods. It is also a method that learns
non-trivial tasks in continuous control as well as discrete control policies
directly from raw pixel inputs. We tested our approach across discrete domains
in Atari games as well as continuous domains in the MuJoCo environment. With
the proposed methods, we are able to achieve higher rewards and a 2- to 3-fold
improvement in sample efficiency on average, compared to previous
state-of-the-art on-policy actor-critic methods. Code is available at
https://github.com/openai/baselinesComment: 14 pages, 9 figures; update github repo lin
- …