3 research outputs found

    Occlusion resistant learning of intuitive physics from videos

    Get PDF
    To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

    Occlusion resistant learning of intuitive physics from videos

    Get PDF
    To reach human performance on complex tasks, akey ability for artificial systems is to understandphysical interactions between objects, and predictfuture outcomes of a situation. This ability, of-ten referred to asintuitive physics, has recentlyreceived attention and several methods were pro-posed to learn these physical rules from video se-quences. Yet, most of these methods are restrictedto the case where no, or only limited, occlusionsoccur. In this work we propose a probabilisticformulation of learning intuitive physics in 3Dscenes with significant inter-object occlusions. Inour formulation, object positions are modelledas latent variables enabling the reconstruction ofthe scene. We then propose a series of approx-imations that make this problem tractable. Ob-ject proposals are linked across frames using acombination of a recurrent interaction network,modeling the physics in object space, and a com-positional renderer, modeling the way in whichobjects project onto pixel space. We demonstratesignificant improvements over state-of-the-art inthe intuitive physics benchmark of Riochet et al.(2018). We apply our method to a second datasetwith increasing levels of occlusions, showing itrealistically predicts segmentation masks up to 30frames in the future. Finally, we also show resultson predicting motion of objects in real video
    corecore