710 research outputs found
Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention
Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations.
We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories.
Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections.
We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions.
We show resulting video segments adapt to targets under partial occlusions and deformations.
Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes.
Third, we study human motion and pose estimation.
We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations.
We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation.
We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets
Efficient Human Pose Estimation with Image-dependent Interactions
Human pose estimation from 2D images is one of the most challenging
and computationally-demanding problems in computer vision. Standard
models such as Pictorial Structures consider interactions between
kinematically connected joints or limbs, leading to inference cost
that is quadratic in the number of pixels. As a result, researchers
and practitioners have restricted themselves to simple models which
only measure the quality of limb-pair possibilities by their 2D
geometric plausibility.
In this talk, we propose novel methods which allow for efficient
inference in richer models with data-dependent interactions. First, we
introduce structured prediction cascades, a structured analog of
binary cascaded classifiers, which learn to focus computational effort
where it is needed, filtering out many states cheaply while ensuring
the correct output is unfiltered. Second, we propose a way to
decompose models of human pose with cyclic dependencies into a
collection of tree models, and provide novel methods to impose model
agreement. Finally, we develop a local linear approach that learns
bases centered around modes in the training data, giving us
image-dependent local models which are fast and accurate.
These techniques allow for sparse and efficient inference on the order
of minutes or seconds per image. As a result, we can afford to model
pairwise interaction potentials much more richly with data-dependent
features such as contour continuity, segmentation alignment, color
consistency, optical flow and multiple modes. We show empirically that
these richer models are worthwhile, obtaining significantly more
accurate pose estimation on popular datasets
Computational Anatomy for Multi-Organ Analysis in Medical Imaging: A Review
The medical image analysis field has traditionally been focused on the
development of organ-, and disease-specific methods. Recently, the interest in
the development of more 20 comprehensive computational anatomical models has
grown, leading to the creation of multi-organ models. Multi-organ approaches,
unlike traditional organ-specific strategies, incorporate inter-organ relations
into the model, thus leading to a more accurate representation of the complex
human anatomy. Inter-organ relations are not only spatial, but also functional
and physiological. Over the years, the strategies 25 proposed to efficiently
model multi-organ structures have evolved from the simple global modeling, to
more sophisticated approaches such as sequential, hierarchical, or machine
learning-based models. In this paper, we present a review of the state of the
art on multi-organ analysis and associated computation anatomy methodology. The
manuscript follows a methodology-based classification of the different
techniques 30 available for the analysis of multi-organs and multi-anatomical
structures, from techniques using point distribution models to the most recent
deep learning-based approaches. With more than 300 papers included in this
review, we reflect on the trends and challenges of the field of computational
anatomy, the particularities of each anatomical region, and the potential of
multi-organ analysis to increase the impact of 35 medical imaging applications
on the future of healthcare.Comment: Paper under revie
Holistic interpretation of visual data based on topology:semantic segmentation of architectural facades
The work presented in this dissertation is a step towards effectively incorporating contextual knowledge in the task of semantic segmentation. To date, the use of context has been confined to the genre of the scene with a few exceptions in the field. Research has been directed towards enhancing appearance descriptors. While this is unarguably important, recent studies show that computer vision has reached a near-human level of performance in relying on these descriptors when objects have stable distinctive surface properties and in proper imaging conditions. When these conditions are not met, humans exploit their knowledge about the intrinsic geometric layout of the scene to make local decisions. Computer vision lags behind when it comes to this asset. For this reason, we aim to bridge the gap by presenting algorithms for semantic segmentation of building facades making use of scene topological aspects. We provide a classification scheme to carry out segmentation and recognition simultaneously.The algorithm is able to solve a single optimization function and yield a semantic interpretation of facades, relying on the modeling power of probabilistic graphs and efficient discrete combinatorial optimization tools. We tackle the same problem of semantic facade segmentation with the neural network approach.We attain accuracy figures that are on-par with the state-of-the-art in a fully automated pipeline.Starting from pixelwise classifications obtained via Convolutional Neural Networks (CNN). These are then structurally validated through a cascade of Restricted Boltzmann Machines (RBM) and Multi-Layer Perceptron (MLP) that regenerates the most likely layout. In the domain of architectural modeling, there is geometric multi-model fitting. We introduce a novel guided sampling algorithm based on Minimum Spanning Trees (MST), which surpasses other propagation techniques in terms of robustness to noise. We make a number of additional contributions such as measure of model deviation which captures variations among fitted models
- …