10,081 research outputs found
Unsupervised Learning of Visual Structure using Predictive Generative Networks
The ability to predict future states of the environment is a central pillar
of intelligence. At its core, effective prediction requires an internal model
of the world and an understanding of the rules by which the world changes.
Here, we explore the internal models developed by deep neural networks trained
using a loss based on predicting future frames in synthetic video sequences,
using a CNN-LSTM-deCNN framework. We first show that this architecture can
achieve excellent performance in visual sequence prediction tasks, including
state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever
et al., 2009). Using a weighted mean-squared error and adversarial loss
(Goodfellow et al., 2014), the same architecture successfully extrapolates
out-of-the-plane rotations of computer-generated faces. Furthermore, despite
being trained end-to-end to predict only pixel-level information, our
Predictive Generative Networks learn a representation of the latent structure
of the underlying three-dimensional objects themselves. Importantly, we find
that this representation is naturally tolerant to object transformations, and
generalizes well to new tasks, such as classification of static images. Similar
models trained solely with a reconstruction loss fail to generalize as
effectively. We argue that prediction can serve as a powerful unsupervised loss
for learning rich internal representations of high-level object features.Comment: under review as conference paper at ICLR 201
- …