89,149 research outputs found

    Occlusion resistant learning of intuitive physics from videos

    Get PDF
    To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

    Semantic multimedia remote display for mobile thin clients

    Get PDF
    Current remote display technologies for mobile thin clients convert practically all types of graphical content into sequences of images rendered by the client. Consequently, important information concerning the content semantics is lost. The present paper goes beyond this bottleneck by developing a semantic multimedia remote display. The principle consists of representing the graphical content as a real-time interactive multimedia scene graph. The underlying architecture features novel components for scene-graph creation and management, as well as for user interactivity handling. The experimental setup considers the Linux X windows system and BiFS/LASeR multimedia scene technologies on the server and client sides, respectively. The implemented solution was benchmarked against currently deployed solutions (VNC and Microsoft-RDP), by considering text editing and WWW browsing applications. The quantitative assessments demonstrate: (1) visual quality expressed by seven objective metrics, e.g., PSNR values between 30 and 42 dB or SSIM values larger than 0.9999; (2) downlink bandwidth gain factors ranging from 2 to 60; (3) real-time user event management expressed by network round-trip time reduction by factors of 4-6 and by uplink bandwidth gain factors from 3 to 10; (4) feasible CPU activity, larger than in the RDP case but reduced by a factor of 1.5 with respect to the VNC-HEXTILE
    • …
    corecore