9 research outputs found

    Revisiting Depth Layers from Occlusions

    Full text link
    In this work, we consider images of a scene with a moving object captured by a static camera. As the ob-ject (human or otherwise) moves about the scene, it re-veals pairwise depth-ordering or occlusion cues. The goal of this work is to use these sparse occlusion cues along with monocular depth occlusion cues to densely segment the scene into depth layers. We cast the problem of depth-layer segmentation as a discrete labeling problem on a spatio-temporal Markov Random Field (MRF) that uses the motion occlusion cues along with monocular cues and a smooth motion prior for the moving object. We quantitatively show that depth ordering produced by the proposed combination of the depth cues from object motion and monocular occlu-sion cues are superior to using either feature independently, and using a naı̈ve combination of the features. 1

    iMAPPER: Interaction-guided Scene Mapping from Monocular Videos

    Get PDF
    Next generation smart and augmented reality systems demand a computational understanding of monocular footage that captures humans in physical spaces to reveal plausible object arrangements and human-object interactions. Despite recent advances, both in scene layout and human motion analysis, the above setting remains challenging to analyze due to regular occlusions that occur between objects and human motions. We observe that the interaction between object arrangements and human actions is often strongly correlated, and hence can be used to help recover from these occlusions. We present iMapper, a data-driven method to identify such human-object interactions and utilize them to infer layouts of occluded objects. Starting from a monocular video with detected 2D human joint positions that are potentially noisy and occluded, we first introduce the notion of interaction-saliency as space-time snapshots where informative human-object interactions happen. Then, we propose a global optimization to retrieve and fit interactions from a database to the detected salient interactions in order to best explain the input video. We extensively evaluate the approach, both quantitatively against manually annotated ground truth and through a user study, and demonstrate that iMapper produces plausible scene layouts for scenes with medium to heavy occlusion. Code and data are available on the project page

    Causal video object segmentation from persistence of occlusions

    Full text link
    indicated by, , respectively. On the far right, our algorithm correctly infers that the bag strap is in front of the woman’s arm, which is in front of her trunk, which is in front of the background. Project page

    Bilattice based Logical Reasoning for Automated Visual Surveillance and other Applications

    Get PDF
    The primary objective of an automated visual surveillance system is to observe and understand human behavior and report unusual or potentially dangerous activities/events in a timely manner. Automatically understanding human behavior from visual input, however, is a challenging task. The research presented in this thesis focuses on designing a reasoning framework that can combine, in a principled manner, high level contextual information with low level image processing primitives to interpret visual information. The primary motivation for this work has been to design a reasoning framework that draws heavily upon human like reasoning and reasons explicitly about visual as well as non-visual information to solve classification problems. Humans are adept at performing inference under uncertainty by combining evidence from multiple, noisy and often contradictory sources. This thesis describes a logical reasoning approach in which logical rules encode high level knowledge about the world and logical facts serve as input to the system from real world observations. The reasoning framework supports encoding of multiple rules for the same proposition, representing multiple lines of reasoning and also supports encoding of rules that infer explicit negation and thereby potentially contradictory information. Uncertainties are associated with both the logical rules that guide reasoning as well as with the input facts. This framework has been applied to visual surveillance problems such as human activity recognition, identity maintenance, and human detection. Finally, we have also applied it to the problem of collaborative filtering to predict movie ratings by explicitly reasoning about users preferences

    Video-based Estimation of Activity Level for Assisted Living

    Get PDF
    The continual increase in the population of older adults in the next 50 years envisages an increase of dependants on the family and the Government. Assisted Living technologies are information and communication technologies to assist, improve and monitor the daily living of the old and vulnerable population by promoting greater independence and providing a safe and secure environment at a reduced cost. Most of the assisted living technologies are passive sensor-based solutions where a number of embedded or body-worn sensors are employed or connected over a network to recognize activities. Often the sensors are obtrusive and are extremely sensitive to the performance of the sensors. Visual data is contextually richer than sensor triggered firings. Visual data along with being contextual is also extremely sensitive. Since visual data is intrusive, a qualitative study among older adults within the community was carried out to get a context of the privacy concerns of having a camera within an assisted living environment. Building on the outcomes of the focus group discussions, a novel monitoring framework is proposed. Following the framework, Activity Level, as an effective metric to measure the amount of activity undertaken by an individual is proposed. Activity Level is estimated by extracting and classifying pixel-based and phase-based motion features. Experiments reveal that phase-based features perform better than pixel-based features. Experiments are carried out using the novel Sheffield Activities of Daily Living Dataset, which has been developed and made available for further computer vision research for assisted living

    Multi-view dynamic scene modeling

    Get PDF
    Modeling dynamic scenes/events from multiple fixed-location vision sensors, such as video camcorders, infrared cameras, Time-of-Flight sensors etc, is of broad interest in computer vision society, with many applications including 3D TV, virtual reality, medical surgery, markerless motion capture, video games, and security surveillance. However, most of the existing multi-view systems are set up in a strictly controlled indoor environment, with fixed lighting conditions and simple background views. Many challenges are preventing the technology to an outdoor natural environment. These include varying sunlight, shadows, reflections, background motion and visual occlusion. In this thesis, I address different aspects to overcome all of the aforementioned difficulties, so as to reduce human preparation and manipulation, and to make a robust outdoor system as automatic as possible. In particular, the main novel technical contributions of this thesis are as follows: a generic heterogeneous sensor fusion framework for robust 3D shape estimation together; a way to automatically recover 3D shapes of static occluder from dynamic object silhouette cues, which explicitly models the static visual occluding event along the viewing rays; a system to model multiple dynamic objects shapes and track their identities simultaneously, which explicitly models the inter-occluding event between dynamic objects; a scheme to recover an object's dense 3D motion flow over time, without assuming any prior knowledge of the underlying structure of the dynamic object being modeled, which helps to enforce temporal consistency of natural motions and initializes more advanced shape learning and motion analysis. A unified automatic calibration algorithm for the heterogeneous network of conventional cameras/camcorders and new Time-of-Flight sensors is also proposed
    corecore