856 research outputs found

    Gated networks: an inventory

    Get PDF
    Gated networks are networks that contain gating connections, in which the outputs of at least two neurons are multiplied. Initially, gated networks were used to learn relationships between two input sources, such as pixels from two images. More recently, they have been applied to learning activity recognition or multi-modal representations. The aims of this paper are threefold: 1) to explain the basic computations in gated networks to the non-expert, while adopting a standpoint that insists on their symmetric nature. 2) to serve as a quick reference guide to the recent literature, by providing an inventory of applications of these networks, as well as recent extensions to the basic architecture. 3) to suggest future research directions and applications.Comment: Unpublished manuscript, 17 page

    Action-Conditional Video Prediction using Deep Networks in Atari Games

    Full text link
    Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames. While not composed of natural scenes, frames in Atari games are high-dimensional in size, can involve tens of objects with one or more objects being controlled by the actions directly and many other objects being influenced indirectly, can involve entry and departure of objects, and can involve deep partial observability. We propose and evaluate two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Experimental results show that the proposed architectures are able to generate visually-realistic frames that are also useful for control over approximately 100-step action-conditional futures in some games. To the best of our knowledge, this paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs.Comment: Published at NIPS 2015 (Advances in Neural Information Processing Systems 28

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    "Mental Rotation" by Optimizing Transforming Distance

    Full text link
    The human visual system is able to recognize objects despite transformations that can drastically alter their appearance. To this end, much effort has been devoted to the invariance properties of recognition systems. Invariance can be engineered (e.g. convolutional nets), or learned from data explicitly (e.g. temporal coherence) or implicitly (e.g. by data augmentation). One idea that has not, to date, been explored is the integration of latent variables which permit a search over a learned space of transformations. Motivated by evidence that people mentally simulate transformations in space while comparing examples, so-called "mental rotation", we propose a transforming distance. Here, a trained relational model actively transforms pairs of examples so that they are maximally similar in some feature space yet respect the learned transformational constraints. We apply our method to nearest-neighbour problems on the Toronto Face Database and NORB
    • …
    corecore