Search CORE

1,538 research outputs found

Occlusion resistant learning of intuitive physics from videos

Author: Dupoux Emmanuel
Laptev Ivan
Riochet Ronan
Sivic Josef
Publication venue
Publication date: 30/04/2020
Field of study

To reach human performance on complex tasks, a key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation. This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences. Yet, most of these methods are restricted to the case where no, or only limited, occlusions occur. In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions. In our formulation, object positions are modeled as latent variables enabling the reconstruction of the scene. We then propose a series of approximations that make this problem tractable. Object proposals are linked across frames using a combination of a recurrent interaction network, modeling the physics in object space, and a compositional renderer, modeling the way in which objects project onto pixel space. We demonstrate significant improvements over state-of-the-art in the intuitive physics benchmark of IntPhys. We apply our method to a second dataset with increasing levels of occlusions, showing it realistically predicts segmentation masks up to 30 frames in the future. Finally, we also show results on predicting motion of objects in real videos

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Pose Estimation and Segmentation of Multiple People in Stereoscopic Movies

Author: Alahari Karteek
Laptev Ivan
Seguin Guillaume
Sivic Josef
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2015
Field of study

International audienceWe describe a method to obtain a pixel-wise segmentation and pose estimation of multiple people in stereoscopic videos. This task involves challenges such as dealing with unconstrained stereoscopic video, non-stationary cameras, and complex indoor and outdoor dynamic scenes with multiple people. We cast the problem as a discrete labelling task involving multiple person labels, devise a suitable cost function, and optimize it efficiently. The contributions of our work are two-fold: First, we develop a segmentation model incorporating person detections and learnt articulated pose segmentation masks, as well as colour, motion, and stereo disparity cues. The model also explicitly represents depth ordering and occlusion. Second, we introduce a stereoscopic dataset with frames extracted from feature-length movies "StreetDance 3D" and "Pina". The dataset contains 587 annotated human poses, 1158 bounding box annotations and 686 pixel-wise segmentations of people. The dataset is composed of indoor and outdoor scenes depicting multiple people with frequent occlusions. We demonstrate results on our new challenging dataset, as well as on the H2view dataset from (Sheasby et al. ACCV 2012)

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Object Segmentation from Motion Discontinuities and Temporal Occlusions–A Biologically Inspired Model

Author: A Thielscher
AS Ogale
AS Ogale
AT Smith
C Beck
Cornelia Beck
CW Tyler
D Feldman
DC Van Essen
Ernest Greene
F Stein
FT Qiu
GT Chou
H Neumann
Heiko Neumann
J Bullier
JA Movshon
JM Hupé
JYA Wang
K Nakayama
M Mishkin
MJ Black
O Sporns
P Bayerl
P Bayerl
Q Ke
R Eckhorn
R v. d. Heydt
RT Born
S Eifuku
S Eifuku
SA Niyogi
T Hansen
Thilo Ognibeni
Y Weiss
Publication venue: Public Library of Science
Publication date: 27/11/2008
Field of study

BACKGROUND: Optic flow is an important cue for object detection. Humans are able to perceive objects in a scene using only kinetic boundaries, and can perform the task even when other shape cues are not provided. These kinetic boundaries are characterized by the presence of motion discontinuities in a local neighbourhood. In addition, temporal occlusions appear along the boundaries as the object in front covers the background and the objects that are spatially behind it. METHODOLOGY/PRINCIPAL FINDINGS: From a technical point of view, the detection of motion boundaries for segmentation based on optic flow is a difficult task. This is due to the problem that flow detected along such boundaries is generally not reliable. We propose a model derived from mechanisms found in visual areas V1, MT, and MSTl of human and primate cortex that achieves robust detection along motion boundaries. It includes two separate mechanisms for both the detection of motion discontinuities and of occlusion regions based on how neurons respond to spatial and temporal contrast, respectively. The mechanisms are embedded in a biologically inspired architecture that integrates information of different model components of the visual processing due to feedback connections. In particular, mutual interactions between the detection of motion discontinuities and temporal occlusions allow a considerable improvement of the kinetic boundary detection. CONCLUSIONS/SIGNIFICANCE: A new model is proposed that uses optic flow cues to detect motion discontinuities and object occlusion. We suggest that by combining these results for motion discontinuities and object occlusion, object segmentation within the model can be improved. This idea could also be applied in other models for object segmentation. In addition, we discuss how this model is related to neurophysiological findings. The model was successfully tested both with artificial and real sequences including self and object motion

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Amodal Instance Segmentation and Multi-Object Tracking with Deep Pixel Embedding

Author: Liu Yanfeng
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 05/12/2019
Field of study

This thesis extends upon the representational output of semantic instance segmentation by explicitly including both visible and occluded parts. A fully convolutional network is trained to produce consistent pixel-level embedding across two layers such that, when clustered, the results convey the full spatial extent and depth ordering of each instance. Results demonstrate that the network can accurately estimate complete masks in the presence of occlusion and outperform leading top-down bounding-box approaches. The model is further extended to produce consistent pixel-level embeddings across two consecutive image frames from a video to simultaneously perform amodal instance segmentation and multi-object tracking. No post-processing trackers or Hungarian Algorithm is needed to perform multi-object tracking. The advantages and disadvantages of such a bounding-box-free approach are studied thoroughly. Experiments show that the proposed method outperforms the state-of-the-art bounding-box based approach on tracking animated moving objects. Advisor: Eric T. Psota and Lance C. Pére

DigitalCommons@University of Nebraska