488 research outputs found
Multi-view dynamic scene modeling
Modeling dynamic scenes/events from multiple fixed-location vision sensors, such as video camcorders, infrared cameras, Time-of-Flight sensors etc, is of broad interest in computer vision society, with many applications including 3D TV, virtual reality, medical surgery, markerless motion capture, video games, and security surveillance. However, most of the existing multi-view systems are set up in a strictly controlled indoor environment, with fixed lighting conditions and simple background views. Many challenges are preventing the technology to an outdoor natural environment. These include varying sunlight, shadows, reflections, background motion and visual occlusion. In this thesis, I address different aspects to overcome all of the aforementioned difficulties, so as to reduce human preparation and manipulation, and to make a robust outdoor system as automatic as possible. In particular, the main novel technical contributions of this thesis are as follows: a generic heterogeneous sensor fusion framework for robust 3D shape estimation together; a way to automatically recover 3D shapes of static occluder from dynamic object silhouette cues, which explicitly models the static visual occluding event along the viewing rays; a system to model multiple dynamic objects shapes and track their identities simultaneously, which explicitly models the inter-occluding event between dynamic objects; a scheme to recover an object's dense 3D motion flow over time, without assuming any prior knowledge of the underlying structure of the dynamic object being modeled, which helps to enforce temporal consistency of natural motions and initializes more advanced shape learning and motion analysis. A unified automatic calibration algorithm for the heterogeneous network of conventional cameras/camcorders and new Time-of-Flight sensors is also proposed
3D occlusion recovery using few cameras
We present a practical framework for detecting and modeling 3D static occlusions for wide-baseline, multi-camera scenarios where the number of cameras is small. The framework consists of an iterative learning procedure where at each frame the occlusion model is used to solve the voxel occupancy problem, and this solution is then used to update the occlusion model. Along with this iterative procedure, there are two contributions of the proposed work: (1) a novel energy function (which can be minimized via graph cuts) specifically designed for use in this procedure, and (2) an application that incorporates our probabilistic occlusion model into a 3D tracking system. Both qualitative and quantitative results of the proposed algorithm and its incorporation with a 3D tracker are presented for support. 1
Automatic visual detection of human behavior: a review from 2000 to 2014
Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012
Articulated human tracking and behavioural analysis in video sequences
Recently, there has been a dramatic growth of interest in the observation and tracking
of human subjects through video sequences. Arguably, the principal impetus has come
from the perceived demand for technological surveillance, however applications in entertainment,
intelligent domiciles and medicine are also increasing. This thesis examines
human articulated tracking and the classi cation of human movement, rst separately
and then as a sequential process.
First, this thesis considers the development and training of a 3D model of human body
structure and dynamics. To process video sequences, an observation model is also designed
with a multi-component likelihood based on edge, silhouette and colour. This is de ned on
the articulated limbs, and visible from a single or multiple cameras, each of which may be
calibrated from that sequence. Second, for behavioural analysis, we develop a methodology
in which actions and activities are described by semantic labels generated from a Movement
Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was
developed for human tracking that allows multi-level parameter search consistent with the
body structure. This tracker relies on the articulated motion prediction provided by the
MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to
generate a probabilistic activity description with action labels.
The implemented algorithms for tracking and behavioural analysis are tested extensively
and independently against ground truth on human tracking and surveillance
datasets. Dynamic models are shown to predict and generate synthetic motion, while
MCM recovers both periodic and non-periodic activities, de ned either on the whole body
or at the limb level. Tracking results are comparable with the state of the art, however
the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS
Temporally Coherent General Dynamic Scene Reconstruction
Existing techniques for dynamic scene reconstruction from multiple
wide-baseline cameras primarily focus on reconstruction in controlled
environments, with fixed calibrated cameras and strong prior constraints. This
paper introduces a general approach to obtain a 4D representation of complex
dynamic scenes from multi-view wide-baseline static or moving cameras without
prior knowledge of the scene structure, appearance, or illumination.
Contributions of the work are: An automatic method for initial coarse
reconstruction to initialize joint estimation; Sparse-to-dense temporal
correspondence integrated with joint multi-view segmentation and reconstruction
to introduce temporal coherence; and a general robust approach for joint
segmentation refinement and dense reconstruction of dynamic scenes by
introducing shape constraint. Comparison with state-of-the-art approaches on a
variety of complex indoor and outdoor scenes, demonstrates improved accuracy in
both multi-view segmentation and dense reconstruction. This paper demonstrates
unsupervised reconstruction of complete temporally coherent 4D scene models
with improved non-rigid object segmentation and shape reconstruction and its
application to free-viewpoint rendering and virtual reality.Comment: Submitted to IJCV 2019. arXiv admin note: substantial text overlap
with arXiv:1603.0338
4D Temporally Coherent Light-field Video
Light-field video has recently been used in virtual and augmented reality
applications to increase realism and immersion. However, existing light-field
methods are generally limited to static scenes due to the requirement to
acquire a dense scene representation. The large amount of data and the absence
of methods to infer temporal coherence pose major challenges in storage,
compression and editing compared to conventional video. In this paper, we
propose the first method to extract a spatio-temporally coherent light-field
video representation. A novel method to obtain Epipolar Plane Images (EPIs)
from a spare light-field camera array is proposed. EPIs are used to constrain
scene flow estimation to obtain 4D temporally coherent representations of
dynamic light-fields. Temporal coherence is achieved on a variety of
light-field datasets. Evaluation of the proposed light-field scene flow against
existing multi-view dense correspondence approaches demonstrates a significant
improvement in accuracy of temporal coherence.Comment: Published in 3D Vision (3DV) 201
Vision-based techniques for gait recognition
Global security concerns have raised a proliferation of video surveillance
devices. Intelligent surveillance systems seek to discover possible threats
automatically and raise alerts. Being able to identify the surveyed object can
help determine its threat level. The current generation of devices provide
digital video data to be analysed for time varying features to assist in the
identification process. Commonly, people queue up to access a facility and
approach a video camera in full frontal view. In this environment, a variety of
biometrics are available - for example, gait which includes temporal features
like stride period. Gait can be measured unobtrusively at a distance. The video
data will also include face features, which are short-range biometrics. In this
way, one can combine biometrics naturally using one set of data. In this paper
we survey current techniques of gait recognition and modelling with the
environment in which the research was conducted. We also discuss in detail the
issues arising from deriving gait data, such as perspective and occlusion
effects, together with the associated computer vision challenges of reliable
tracking of human movement. Then, after highlighting these issues and
challenges related to gait processing, we proceed to discuss the frameworks
combining gait with other biometrics. We then provide motivations for a novel
paradigm in biometrics-based human recognition, i.e. the use of the
fronto-normal view of gait as a far-range biometrics combined with biometrics
operating at a near distance
- …