110 research outputs found
Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking
This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization
Person Re-Identification Techniques for Intelligent Video Surveillance Systems
Nowadays, intelligent video-surveillance is one of the most active research fields in com- puter vision and machine learning techniques which provides useful tools for surveillance operators and forensic video investigators. Person re-identification is among these tools; it consists of recognizing whether an individual has already been observed over a network of cameras. This tool can also be employed in various possible applications, e.g., off-line retrieval of all the video-sequences showing an individual of interest whose image is given as query, or on-line pedestrian tracking over multiple cameras. For the off-line retrieval applications, one of the goals of person re-identification systems is to support video surveillance operators and forensic investigators to find an individual of interest in videos acquired by a network of non-overlapping cameras. This is attained by sorting images of previously ob- served individuals for decreasing values of their similarity with a given probe individual.
This task is typically achieved by exploiting the clothing appearance, in which a classical biometric methods like the face recognition is impeded to be practical in real-world video surveillance scenarios, because of low-quality of acquired images. Existing clothing appearance descriptors, together with their similarity measures, are mostly aimed at im- proving ranking quality. These methods usually are employed as part-based body model in order to extract image signature that might be independently treated in different body parts (e.g. torso and legs). Whereas, it is a must that a re-identification model to be robust and discriminate on individual of interest recognition, the issue of the processing time might also be crucial in terms of tackling this task in real-world scenarios. This issue can be also seen from two different point of views such as processing time to construct a model (aka descriptor generation); which usually can be done off-line, and processing time to find the correct individual from bunch of acquired video frames (aka descriptor matching); which is the real-time procedure of the re-identification systems.
This thesis addresses the issue of processing time for descriptor matching, instead of im- proving ranking quality, which is also relevant in practical applications involving interaction with human operators. It will be shown how a trade-off between processing time and rank- ing quality, for any given descriptor, can be achieved through a multi-stage ranking approach inspired by multi-stage approaches to classification problems presented in pattern recogni- tion area, which it is further adapting to the re-identification task as a ranking problem. A discussion of design criteria is therefore presented as so-called multi-stage re-identification systems, and evaluation of the proposed approach carry out on three benchmark data sets, using four state-of-the-art descriptors. Additionally, by concerning to the issue of processing time, typical dimensional reduction methods are studied in terms of reducing the processing time of a descriptor where a high-dimensional feature space is generated by a specific person re-identification descriptor. An empirically experimental result is also presented in this case, and three well-known feature reduction methods are applied them on two state-of-the-art descriptors on two benchmark data sets
Robust surface modelling of visual hull from multiple silhouettes
Reconstructing depth information from images is one of the actively researched themes
in computer vision and its application involves most vision research areas from object
recognition to realistic visualisation. Amongst other useful vision-based reconstruction
techniques, this thesis extensively investigates the visual hull (VH) concept for volume
approximation and its robust surface modelling when various views of an object are
available. Assuming that multiple images are captured from a circular motion, projection
matrices are generally parameterised in terms of a rotation angle from a reference position
in order to facilitate the multi-camera calibration. However, this assumption is often
violated in practice, i.e., a pure rotation in a planar motion with accurate rotation angle
is hardly realisable. To address this problem, at first, this thesis proposes a calibration
method associated with the approximate circular motion.
With these modified projection matrices, a resulting VH is represented by a hierarchical
tree structure of voxels from which surfaces are extracted by the Marching
cubes (MC) algorithm. However, the surfaces may have unexpected artefacts caused by
a coarser volume reconstruction, the topological ambiguity of the MC algorithm, and
imperfect image processing or calibration result. To avoid this sensitivity, this thesis
proposes a robust surface construction algorithm which initially classifies local convex
regions from imperfect MC vertices and then aggregates local surfaces constructed by the
3D convex hull algorithm. Furthermore, this thesis also explores the use of wide baseline
images to refine a coarse VH using an affine invariant region descriptor. This improves
the quality of VH when a small number of initial views is given.
In conclusion, the proposed methods achieve a 3D model with enhanced accuracy.
Also, robust surface modelling is retained when silhouette images are degraded by
practical noise
Toward Automated Aerial Refueling: Relative Navigation with Structure from Motion
The USAF\u27s use of UAS has expanded from reconnaissance to hunter/killer missions. As the UAS mission further expands into aerial combat, better performance and larger payloads will have a negative correlation with range and loiter times. Additionally, the Air Force Future Operating Concept calls for \formations of uninhabited refueling aircraft...[that] enable refueling operations partway inside threat areas. However, a lack of accurate relative positioning information prevents the ability to safely maintain close formation flight and contact between a tanker and a UAS. The inclusion of cutting edge vision systems on present refueling platforms may provide the information necessary to support a AAR mission by estimating the position of a trailing aircraft to provide inputs to a UAS controller capable of maintaining a given position. This research examines the ability of SfM to generate relative navigation information. Previous AAR research efforts involved the use of differential GPS, LiDAR, and vision systems. This research aims to leverage current and future imaging technology to compliment these solutions. The algorithm used in this thesis generates a point cloud by determining 3D structure from a sequence of 2D images. The algorithm then utilizes PCA to register the point cloud to a reference model. The algorithm was tested in a real world environment using a 1:7 scale F-15 model. Additionally, this thesis studies common 3D rigid registration algorithms in an effort characterize their performance in the AAR domain. Three algorithms are tested for runtime and registration accuracy with four data sets
Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework
The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications
Robust surface modelling of visual hull from multiple silhouettes
Reconstructing depth information from images is one of the actively researched themes
in computer vision and its application involves most vision research areas from object
recognition to realistic visualisation. Amongst other useful vision-based reconstruction
techniques, this thesis extensively investigates the visual hull (VH) concept for volume
approximation and its robust surface modelling when various views of an object are
available. Assuming that multiple images are captured from a circular motion, projection
matrices are generally parameterised in terms of a rotation angle from a reference position
in order to facilitate the multi-camera calibration. However, this assumption is often
violated in practice, i.e., a pure rotation in a planar motion with accurate rotation angle
is hardly realisable. To address this problem, at first, this thesis proposes a calibration
method associated with the approximate circular motion.
With these modified projection matrices, a resulting VH is represented by a hierarchical
tree structure of voxels from which surfaces are extracted by the Marching
cubes (MC) algorithm. However, the surfaces may have unexpected artefacts caused by
a coarser volume reconstruction, the topological ambiguity of the MC algorithm, and
imperfect image processing or calibration result. To avoid this sensitivity, this thesis
proposes a robust surface construction algorithm which initially classifies local convex
regions from imperfect MC vertices and then aggregates local surfaces constructed by the
3D convex hull algorithm. Furthermore, this thesis also explores the use of wide baseline
images to refine a coarse VH using an affine invariant region descriptor. This improves
the quality of VH when a small number of initial views is given.
In conclusion, the proposed methods achieve a 3D model with enhanced accuracy.
Also, robust surface modelling is retained when silhouette images are degraded by
practical noise
Multi-Surface Simplex Spine Segmentation for Spine Surgery Simulation and Planning
This research proposes to develop a knowledge-based multi-surface simplex deformable model for segmentation of healthy as well as pathological lumbar spine data. It aims to provide a more accurate and robust segmentation scheme for identification of intervertebral disc pathologies to assist with spine surgery planning. A robust technique that combines multi-surface and shape statistics-aware variants of the deformable simplex model is presented. Statistical shape variation within the dataset has been captured by application of principal component analysis and incorporated during the segmentation process to refine results. In the case where shape statistics hinder detection of the pathological region, user-assistance is allowed to disable the prior shape influence during deformation. Results have been validated against user-assisted expert segmentation
Tensor Representations for Object Classification and Detection
A key problem in object recognition is finding a suitable object representation.
For historical and computational reasons, vector descriptions that encode particular
statistical properties of the data have been broadly applied. However, employing
tensor representation can describe the interactions of multiple factors
inherent to image formation. One of the most convenient uses for tensors is to represent
complex objects in order to build a discriminative description.
Thus thesis has several main contributions, focusing on visual data detection (e.g. of heads or pedestrians) and classification (e.g. of head or human body orientation) in still images and on machine learning techniques to analyse tensor data. These applications are among the most studied in computer vision and are typically formulated as binary or multi-class classification problems.
The applicative context of this thesis is the video surveillance, where classification and detection tasks
can be very hard, due to the scarce resolution and the noise characterising
sensor data. Therefore, the main goal in that context is to design algorithms that can
characterise different objects of interest, especially when immersed in a cluttered
background and captured at low resolution.
In the different amount of machine learning approaches, the ensemble-of-classifiers demonstrated to reach
excellent classification accuracy, good generalisation ability, and robustness of noisy data. For these
reasons, some approaches in that class have been adopted as basic machine classification
frameworks to build robust classifiers and detectors. Moreover, also
kernel machines has been exploited for classification purposes,
since they represent a natural learning framework for tensors
Object Tracking and Mensuration in Surveillance Videos
This thesis focuses on tracking and mensuration in surveillance videos. The
first part of the thesis discusses several object tracking approaches based on the
different properties of tracking targets. For airborne videos, where the targets are
usually small and with low resolutions, an approach of building motion models for
foreground/background proposed in which the foreground target is simplified as a
rigid object. For relatively high resolution targets, the non-rigid models are applied.
An active contour-based algorithm has been introduced. The algorithm is based on
decomposing the tracking into three parts: estimate the affine transform parameters
between successive frames using particle filters; detect the contour deformation using
a probabilistic deformation map, and regulate the deformation by projecting the
updated model onto a trained shape subspace. The active appearance Markov chain
(AAMC). It integrates a statistical model of shape, appearance and motion. In the
AAMC model, a Markov chain represents the switching of motion phases (poses),
and several pairwise active appearance model (P-AAM) components characterize the
shape, appearance and motion information for different motion phases. The second
part of the thesis covers video mensuration, in which we have proposed a heightmeasuring
algorithm with less human supervision, more flexibility and improved
robustness. From videos acquired by an uncalibrated stationary camera, we first
recover the vanishing line and the vertical point of the scene. We then apply a single
view mensuration algorithm to each of the frames to obtain height measurements.
Finally, using the LMedS as the cost function and the Robbins-Monro stochastic
approximation (RMSA) technique to obtain the optimal estimate
- …