3,589 research outputs found

    A Survey on Bayesian Deep Learning

    Full text link
    A comprehensive artificial intelligence system needs to not only perceive the environment with different `senses' (e.g., seeing and hearing) but also infer the world's conditional (or even causal) relations and corresponding uncertainty. The past decade has seen major advances in many perception tasks such as visual object recognition and speech recognition using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. In recent years, Bayesian deep learning has emerged as a unified probabilistic framework to tightly integrate deep learning and Bayesian models. In this general framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in turn, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a comprehensive introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, control, etc. Besides, we also discuss the relationship and differences between Bayesian deep learning and other related topics such as Bayesian treatment of neural networks.Comment: To appear in ACM Computing Surveys (CSUR) 202

    Learning object-centric representations

    Get PDF
    Whenever an agent interacts with its environment, it has to take into account and interact with any objects present in this environment. And yet, the majority of machine learning solutions either treat objects only implicitly or employ highly-engineered solutions that account for objects through object detection algorithms. In this thesis, we explore supervised and unsupervised methods for learning object-centric representations from vision. We focus on end-to-end learning, where information about objects can be extracted directly from images, and where every object can be separately described by a single vector-valued variable. Specifically, we present three novel methods: • HART and MOHART, which track single- and multiple-objects in video, respectively, by using RNNS with a hierarchy of differentiable attention mechanisms. These algorithms learn to anticipate future appearance changes and movement of tracking objects, thereby learning representations that describe every tracked object separately. • SQAIR, a VAE-based generative model of moving objects, which explicitly models disappearance and appearance of new objects in the scene. It models every object with a separate latent variable, and disentangles appearance, position and scale of each object. Posterior inference in this model allows for unsupervised object detection and tracking. • SCAE, an unsupervised autoencoder with in-built knowledge of two-dimensional geometry and object-part decomposition, which is based on capsule networks. It learns to discover parts present in an image, and group those parts into objects. Each object is modelled by a separate object capsule, whose activation probability is highly correlated with the object class, therefore allowing for state-of-the-art unsupervised image classification

    Occlusion resistant learning of intuitive physics from videos

    Get PDF
    To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

    Imitation Learning for Swarm Control using Variational Inference

    Get PDF
    Swarms are groups of robots that can coordinate, cooperate, and communicate to achieve tasks that may be impossible for a single robot. These systems exhibit complex dynamical behavior, similar to those observed in physics, neuroscience, finance, biology, social and communication networks, etc. For instance, in Biology, schools of fish, swarm of bacteria, colony of termites exhibit flocking behavior to achieve simple and complex tasks. Modeling the dynamics of flocking in animals is challenging as we usually do not have full knowledge of the dynamics of the system and how individual agent interact. The environment of swarms is also very noisy and chaotic. We usually only can observe the individual trajectories of the agents. This work presents a technique to learn how to discover and understand the underlying governing dynamics of these systems and how they interact from observation data alone using variational inference in an unsupervised manner. This is done by modeling the observed system dynamics as graphs and reconstructing the dynamics using variational autoencoders through multiple message passing operations in the encoder and decoder. By achieving this, we can apply our understanding of the complex behavior of swarm of animals to robotic systems to imitate flocking behavior of animals and perform decentralized control of robotic swarms. The approach relies on data-driven model discovery to learn local decentralized controllers that mimic the motion constraints and policies of animal flocks. To verify and validate this technique, experiments were done on observations from schools of fish and synthetic data from boids model
    • …
    corecore