3,114 research outputs found
Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking
This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization
Automated Markerless Extraction of Walking People Using Deformable Contour Models
We develop a new automated markerless motion capture system for the analysis of walking people. We employ global evidence gathering techniques guided by biomechanical analysis to robustly extract articulated motion. This forms a basis for new deformable contour models, using local image cues to capture shape and motion at a more detailed level. We extend the greedy snake formulation to include temporal constraints and occlusion modelling, increasing the capability of this technique when dealing with cluttered and self-occluding extraction targets. This approach is evaluated on a large database of indoor and outdoor video data, demonstrating fast and autonomous motion capture for walking people
On the merits of the Gaussian Mixture as a model for oriented edgel distributions
The aim of this report is to establish the credibility of the Gaussian Mixture Model (GMM) as a model for the distributions of oriented edgels of rigid and biological objects in noisy images. This is tackled in two stages: first, the response of the Soble filter to noisy pixels is analysed to show that the result holds for smooth ridid objects. Second, arguments are presented to support the proposition that the model can also effectively capture the added uncertainty introduced by natural shape variation, as found in images of biological objects. The result has particular application in the extension of the Generalized Hough Transform (GHT) to deformable shapes; in particular if offers a tailored and manipulable alternative to the non-parametric kernel density estimate used by Ecabert and Thiran
Automatic Lumbar Vertebrae Segmentation in Fluoroscopic Images via Optimised Concurrent Hough Transform
Low back pain is a very common problem in the industrialised countries and its associated cost is enormous. Diagnosis of the underlying causes can be extremely difficult. Many studies have focused on mechanical disorders of the spine. Digital videofluoroscopy (DVF) was widely used to obtain images for motion studies. This can provide motion sequences of the lumbar spine, but the images obtained often suffer due to noise, exacerbated by the very low radiation dosage. Thus determining vertebrae position within the image sequence presents a considerable challenge. In this paper, we show how our new approach can automatically detect the positions and borders of vertebrae concurrently, relieving many of the problems experienced in other approaches. First, we use phase congruency to relieve difficulty associated with threshold selection in edge detection of the illumination variant DVF images. Then, our new Hough transform approach is applied to determine the moving vertebrae, concurrently. We include optimisation via a genetic algorithm as without it the extraction of moving multiple vertebrae is computationally daunting. Our results show that this new approach can indeed provide extractions of position and rotation which appear to be of sufficient quality to aid therapy and diagnosis of spinal disorders
Blending Learning and Inference in Structured Prediction
In this paper we derive an efficient algorithm to learn the parameters of
structured predictors in general graphical models. This algorithm blends the
learning and inference tasks, which results in a significant speedup over
traditional approaches, such as conditional random fields and structured
support vector machines. For this purpose we utilize the structures of the
predictors to describe a low dimensional structured prediction task which
encourages local consistencies within the different structures while learning
the parameters of the model. Convexity of the learning task provides the means
to enforce the consistencies between the different parts. The
inference-learning blending algorithm that we propose is guaranteed to converge
to the optimum of the low dimensional primal and dual programs. Unlike many of
the existing approaches, the inference-learning blending allows us to learn
efficiently high-order graphical models, over regions of any size, and very
large number of parameters. We demonstrate the effectiveness of our approach,
while presenting state-of-the-art results in stereo estimation, semantic
segmentation, shape reconstruction, and indoor scene understanding
ROAM: a Rich Object Appearance Model with Application to Rotoscoping
Rotoscoping, the detailed delineation of scene elements through a video shot,
is a painstaking task of tremendous importance in professional post-production
pipelines. While pixel-wise segmentation techniques can help for this task,
professional rotoscoping tools rely on parametric curves that offer the artists
a much better interactive control on the definition, editing and manipulation
of the segments of interest. Sticking to this prevalent rotoscoping paradigm,
we propose a novel framework to capture and track the visual aspect of an
arbitrary object in a scene, given a first closed outline of this object. This
model combines a collection of local foreground/background appearance models
spread along the outline, a global appearance model of the enclosed object and
a set of distinctive foreground landmarks. The structure of this rich
appearance model allows simple initialization, efficient iterative optimization
with exact minimization at each step, and on-line adaptation in videos. We
demonstrate qualitatively and quantitatively the merit of this framework
through comparisons with tools based on either dynamic segmentation with a
closed curve or pixel-wise binary labelling
A Combinatorial Solution to Non-Rigid 3D Shape-to-Image Matching
We propose a combinatorial solution for the problem of non-rigidly matching a
3D shape to 3D image data. To this end, we model the shape as a triangular mesh
and allow each triangle of this mesh to be rigidly transformed to achieve a
suitable matching to the image. By penalising the distance and the relative
rotation between neighbouring triangles our matching compromises between image
and shape information. In this paper, we resolve two major challenges: Firstly,
we address the resulting large and NP-hard combinatorial problem with a
suitable graph-theoretic approach. Secondly, we propose an efficient
discretisation of the unbounded 6-dimensional Lie group SE(3). To our knowledge
this is the first combinatorial formulation for non-rigid 3D shape-to-image
matching. In contrast to existing local (gradient descent) optimisation
methods, we obtain solutions that do not require a good initialisation and that
are within a bound of the optimal solution. We evaluate the proposed method on
the two problems of non-rigid 3D shape-to-shape and non-rigid 3D shape-to-image
registration and demonstrate that it provides promising results.Comment: 10 pages, 7 figure
- …