856 research outputs found
Gated networks: an inventory
Gated networks are networks that contain gating connections, in which the
outputs of at least two neurons are multiplied. Initially, gated networks were
used to learn relationships between two input sources, such as pixels from two
images. More recently, they have been applied to learning activity recognition
or multi-modal representations. The aims of this paper are threefold: 1) to
explain the basic computations in gated networks to the non-expert, while
adopting a standpoint that insists on their symmetric nature. 2) to serve as a
quick reference guide to the recent literature, by providing an inventory of
applications of these networks, as well as recent extensions to the basic
architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page
Action-Conditional Video Prediction using Deep Networks in Atari Games
Motivated by vision-based reinforcement learning (RL) problems, in particular
Atari games from the recent benchmark Aracade Learning Environment (ALE), we
consider spatio-temporal prediction problems where future (image-)frames are
dependent on control variables or actions as well as previous frames. While not
composed of natural scenes, frames in Atari games are high-dimensional in size,
can involve tens of objects with one or more objects being controlled by the
actions directly and many other objects being influenced indirectly, can
involve entry and departure of objects, and can involve deep partial
observability. We propose and evaluate two deep neural network architectures
that consist of encoding, action-conditional transformation, and decoding
layers based on convolutional neural networks and recurrent neural networks.
Experimental results show that the proposed architectures are able to generate
visually-realistic frames that are also useful for control over approximately
100-step action-conditional futures in some games. To the best of our
knowledge, this paper is the first to make and evaluate long-term predictions
on high-dimensional video conditioned by control inputs.Comment: Published at NIPS 2015 (Advances in Neural Information Processing
Systems 28
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
"Mental Rotation" by Optimizing Transforming Distance
The human visual system is able to recognize objects despite transformations
that can drastically alter their appearance. To this end, much effort has been
devoted to the invariance properties of recognition systems. Invariance can be
engineered (e.g. convolutional nets), or learned from data explicitly (e.g.
temporal coherence) or implicitly (e.g. by data augmentation). One idea that
has not, to date, been explored is the integration of latent variables which
permit a search over a learned space of transformations. Motivated by evidence
that people mentally simulate transformations in space while comparing
examples, so-called "mental rotation", we propose a transforming distance.
Here, a trained relational model actively transforms pairs of examples so that
they are maximally similar in some feature space yet respect the learned
transformational constraints. We apply our method to nearest-neighbour problems
on the Toronto Face Database and NORB
- …