35,928 research outputs found
PredNet and Predictive Coding: A Critical Review
PredNet, a deep predictive coding network developed by Lotter et al.,
combines a biologically inspired architecture based on the propagation of
prediction error with self-supervised representation learning in video. While
the architecture has drawn a lot of attention and various extensions of the
model exist, there is a lack of a critical analysis. We fill in the gap by
evaluating PredNet both as an implementation of the predictive coding theory
and as a self-supervised video prediction model using a challenging video
action classification dataset. We design an extended model to test if
conditioning future frame predictions on the action class of the video improves
the model performance. We show that PredNet does not yet completely follow the
principles of predictive coding. The proposed top-down conditioning leads to a
performance gain on synthetic data, but does not scale up to the more complex
real-world action classification dataset. Our analysis is aimed at guiding
future research on similar architectures based on the predictive coding theory
Unsupervised Learning of Visual Structure using Predictive Generative Networks
The ability to predict future states of the environment is a central pillar
of intelligence. At its core, effective prediction requires an internal model
of the world and an understanding of the rules by which the world changes.
Here, we explore the internal models developed by deep neural networks trained
using a loss based on predicting future frames in synthetic video sequences,
using a CNN-LSTM-deCNN framework. We first show that this architecture can
achieve excellent performance in visual sequence prediction tasks, including
state-of-the-art performance in a standard 'bouncing balls' dataset (Sutskever
et al., 2009). Using a weighted mean-squared error and adversarial loss
(Goodfellow et al., 2014), the same architecture successfully extrapolates
out-of-the-plane rotations of computer-generated faces. Furthermore, despite
being trained end-to-end to predict only pixel-level information, our
Predictive Generative Networks learn a representation of the latent structure
of the underlying three-dimensional objects themselves. Importantly, we find
that this representation is naturally tolerant to object transformations, and
generalizes well to new tasks, such as classification of static images. Similar
models trained solely with a reconstruction loss fail to generalize as
effectively. We argue that prediction can serve as a powerful unsupervised loss
for learning rich internal representations of high-level object features.Comment: under review as conference paper at ICLR 201
- …