435 research outputs found
Recommended from our members
Healthcare Event and Activity Logging.
The health of patients in the intensive care unit (ICU) can change frequently and inexplicably. Crucial events and activities responsible for these changes often go unnoticed. This paper introduces healthcare event and action logging (HEAL) which automatically and unobtrusively monitors and reports on events and activities that occur in a medical ICU room. HEAL uses a multimodal distributed camera network to monitor and identify ICU activities and estimate sanitation-event qualifiers. At the core is a novel approach to infer person roles based on semantic interactions, a critical requirement in many healthcare settings where individuals' identities must not be identified. The proposed approach for activity representation identifies contextual aspects basis and estimates aspect weights for proper action representation and reconstruction. The flexibility of the proposed algorithms enables the identification of people roles by associating them with inferred interactions and detected activities. A fully working prototype system is developed, tested in a mock ICU room and then deployed in two ICU rooms at a community hospital, thus offering unique capabilities for data gathering and analytics. The proposed method achieves a role identification accuracy of 84% and a backtracking role identification of 79% for obscured roles using interaction and appearance features on real ICU data. Detailed experimental results are provided in the context of four event-sanitation qualifiers: clean, transmission, contamination, and unclean
3D Object Reconstruction from Hand-Object Interactions
Recent advances have enabled 3d object reconstruction approaches using a
single off-the-shelf RGB-D camera. Although these approaches are successful for
a wide range of object classes, they rely on stable and distinctive geometric
or texture features. Many objects like mechanical parts, toys, household or
decorative articles, however, are textureless and characterized by minimalistic
shapes that are simple and symmetric. Existing in-hand scanning systems and 3d
reconstruction techniques fail for such symmetric objects in the absence of
highly distinctive features. In this work, we show that extracting 3d hand
motion for in-hand scanning effectively facilitates the reconstruction of even
featureless and highly symmetric objects and we present an approach that fuses
the rich additional information of hands into a 3d reconstruction pipeline,
significantly contributing to the state-of-the-art of in-hand scanning.Comment: International Conference on Computer Vision (ICCV) 2015,
http://files.is.tue.mpg.de/dtzionas/In-Hand-Scannin
Occlusion reasoning for multiple object visual tracking
Thesis (Ph.D.)--Boston UniversityOcclusion reasoning for visual object tracking in uncontrolled environments is a challenging problem. It becomes significantly more difficult when dense groups of indistinguishable objects are present in the scene that cause frequent inter-object interactions and occlusions. We present several practical solutions that tackle the inter-object occlusions for video surveillance applications.
In particular, this thesis proposes three methods. First, we propose "reconstruction-tracking," an online multi-camera spatial-temporal data association method for tracking large groups of objects imaged with low resolution. As a variant of the well-known Multiple-Hypothesis-Tracker, our approach localizes the positions of objects in 3D space with possibly occluded observations from multiple camera views and performs temporal data association in 3D. Second, we develop "track linking," a class of offline batch processing algorithms for long-term occlusions, where the decision has to be made based on the observations from the entire tracking sequence. We construct a graph representation to characterize occlusion events and propose an efficient graph-based/combinatorial algorithm to resolve occlusions.
Third, we propose a novel Bayesian framework where detection and data association are combined into a single module and solved jointly. Almost all traditional tracking systems address the detection and data association tasks separately in sequential order. Such a design implies that the output of the detector has to be reliable in order to make the data association work. Our framework takes advantage of the often complementary nature of the two subproblems, which not only avoids the error propagation issue from which traditional "detection-tracking approaches" suffer but also eschews common heuristics such as "nonmaximum suppression" of hypotheses by modeling the likelihood of the entire image.
The thesis describes a substantial number of experiments, involving challenging, notably distinct simulated and real data, including infrared and visible-light data sets recorded ourselves or taken from data sets publicly available. In these videos, the number of objects ranges from a dozen to a hundred per frame in both monocular and multiple views. The experiments demonstrate that our approaches achieve results comparable to those of state-of-the-art approaches
Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking
Current multi-person localisation and tracking systems have an over reliance
on the use of appearance models for target re-identification and almost no
approaches employ a complete deep learning solution for both objectives. We
present a novel, complete deep learning framework for multi-person localisation
and tracking. In this context we first introduce a light weight sequential
Generative Adversarial Network architecture for person localisation, which
overcomes issues related to occlusions and noisy detections, typically found in
a multi person environment. In the proposed tracking framework we build upon
recent advances in pedestrian trajectory prediction approaches and propose a
novel data association scheme based on predicted trajectories. This removes the
need for computationally expensive person re-identification systems based on
appearance features and generates human like trajectories with minimal
fragmentation. The proposed method is evaluated on multiple public benchmarks
including both static and dynamic cameras and is capable of generating
outstanding performance, especially among other recently proposed deep neural
network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer
Vision (WACV), 201
RGB-D-based Action Recognition Datasets: A Survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has
attracted increasing attention since the first work reported in 2010. Over this
period, many benchmark datasets have been created to facilitate the development
and evaluation of new algorithms. This raises the question of which dataset to
select and how to use it in providing a fair and objective comparative
evaluation against state-of-the-art methods. To address this issue, this paper
provides a comprehensive review of the most commonly used action recognition
related RGB-D video datasets, including 27 single-view datasets, 10 multi-view
datasets, and 7 multi-person datasets. The detailed information and analysis of
these datasets is a useful resource in guiding insightful selection of datasets
for future research. In addition, the issues with current algorithm evaluation
vis-\'{a}-vis limitations of the available datasets and evaluation protocols
are also highlighted; resulting in a number of recommendations for collection
of new datasets and use of evaluation protocols
3D Object Representations for Recognition.
Object recognition from images is a longstanding and challenging problem in computer vision. The main challenge is that the appearance of objects in images is affected by a number of factors, such as illumination, scale, camera viewpoint, intra-class variability, occlusion, truncation, and so on. How to handle all these factors in object recognition is still an open problem. In this dissertation, I present my efforts in building 3D object representations for object recognition. Compared to 2D appearance based object representations, 3D object representations can capture the 3D nature of objects and better handle viewpoint variation, occlusion and truncation in object recognition.
I introduce three new 3D object representations: the 3D aspect part representation, the 3D aspectlet representation and the 3D voxel pattern representation. These representations are built to handle different challenging factors in object recognition. The 3D aspect part representation is able to capture the appearance change of object categories due to viewpoint transformation. The 3D aspectlet representation and the 3D voxel pattern representation are designed to handle occlusions between objects in addition to viewpoint change. Based on these representations, we propose new object recognition methods and conduct experiments on benchmark datasets to verify the advantages of our methods.
Furthermore, we introduce, PASCAL3D+, a new large scale dataset for 3D object recognition by aligning objects in images with 3D CAD models. We also propose two novel methods to tackle object co-detection and multiview object tracking using our 3D aspect part representation, and a novel Convolutional Neural Network-based approach for object detection using our 3D voxel pattern representation. In order to track multiple objects in videos, we introduce a new online multi-object tracking framework based on Markov Decision Processes. Lastly, I conclude the dissertation and discuss future steps for 3D object recognition.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120836/1/yuxiang_1.pd
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation
Hand motion capture is a popular research field, recently gaining more
attention due to the ubiquity of RGB-D sensors. However, even most recent
approaches focus on the case of a single isolated hand. In this work, we focus
on hands that interact with other hands or objects and present a framework that
successfully captures motion in such interaction scenarios for both rigid and
articulated objects. Our framework combines a generative model with
discriminatively trained salient points to achieve a low tracking error and
with collision detection and physics simulation to achieve physically plausible
estimates even in case of occlusions and missing visual data. Since all
components are unified in a single objective function which is almost
everywhere differentiable, it can be optimized with standard optimization
techniques. Our approach works for monocular RGB-D sequences as well as setups
with multiple synchronized RGB cameras. For a qualitative and quantitative
evaluation, we captured 29 sequences with a large variety of interactions and
up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer
Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a
single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14
hand tracking paper with several extensions, additional experiments and
detail
GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB
We address the highly challenging problem of real-time 3D hand tracking based
on a monocular RGB-only sequence. Our tracking method combines a convolutional
neural network with a kinematic 3D hand model, such that it generalizes well to
unseen data, is robust to occlusions and varying camera viewpoints, and leads
to anatomically plausible as well as temporally smooth hand motions. For
training our CNN we propose a novel approach for the synthetic generation of
training data that is based on a geometrically consistent image-to-image
translation network. To be more specific, we use a neural network that
translates synthetic images to "real" images, such that the so-generated images
follow the same statistical distribution as real-world hand images. For
training this translation network we combine an adversarial loss and a
cycle-consistency loss with a geometric consistency loss in order to preserve
geometric properties (such as hand pose) during translation. We demonstrate
that our hand tracking system outperforms the current state-of-the-art on
challenging RGB-only footage
- …