110 research outputs found

    Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking

    Get PDF
    This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization

    Person Re-Identification Techniques for Intelligent Video Surveillance Systems

    Get PDF
    Nowadays, intelligent video-surveillance is one of the most active research fields in com- puter vision and machine learning techniques which provides useful tools for surveillance operators and forensic video investigators. Person re-identification is among these tools; it consists of recognizing whether an individual has already been observed over a network of cameras. This tool can also be employed in various possible applications, e.g., off-line retrieval of all the video-sequences showing an individual of interest whose image is given as query, or on-line pedestrian tracking over multiple cameras. For the off-line retrieval applications, one of the goals of person re-identification systems is to support video surveillance operators and forensic investigators to find an individual of interest in videos acquired by a network of non-overlapping cameras. This is attained by sorting images of previously ob- served individuals for decreasing values of their similarity with a given probe individual. This task is typically achieved by exploiting the clothing appearance, in which a classical biometric methods like the face recognition is impeded to be practical in real-world video surveillance scenarios, because of low-quality of acquired images. Existing clothing appearance descriptors, together with their similarity measures, are mostly aimed at im- proving ranking quality. These methods usually are employed as part-based body model in order to extract image signature that might be independently treated in different body parts (e.g. torso and legs). Whereas, it is a must that a re-identification model to be robust and discriminate on individual of interest recognition, the issue of the processing time might also be crucial in terms of tackling this task in real-world scenarios. This issue can be also seen from two different point of views such as processing time to construct a model (aka descriptor generation); which usually can be done off-line, and processing time to find the correct individual from bunch of acquired video frames (aka descriptor matching); which is the real-time procedure of the re-identification systems. This thesis addresses the issue of processing time for descriptor matching, instead of im- proving ranking quality, which is also relevant in practical applications involving interaction with human operators. It will be shown how a trade-off between processing time and rank- ing quality, for any given descriptor, can be achieved through a multi-stage ranking approach inspired by multi-stage approaches to classification problems presented in pattern recogni- tion area, which it is further adapting to the re-identification task as a ranking problem. A discussion of design criteria is therefore presented as so-called multi-stage re-identification systems, and evaluation of the proposed approach carry out on three benchmark data sets, using four state-of-the-art descriptors. Additionally, by concerning to the issue of processing time, typical dimensional reduction methods are studied in terms of reducing the processing time of a descriptor where a high-dimensional feature space is generated by a specific person re-identification descriptor. An empirically experimental result is also presented in this case, and three well-known feature reduction methods are applied them on two state-of-the-art descriptors on two benchmark data sets

    Robust surface modelling of visual hull from multiple silhouettes

    Get PDF
    Reconstructing depth information from images is one of the actively researched themes in computer vision and its application involves most vision research areas from object recognition to realistic visualisation. Amongst other useful vision-based reconstruction techniques, this thesis extensively investigates the visual hull (VH) concept for volume approximation and its robust surface modelling when various views of an object are available. Assuming that multiple images are captured from a circular motion, projection matrices are generally parameterised in terms of a rotation angle from a reference position in order to facilitate the multi-camera calibration. However, this assumption is often violated in practice, i.e., a pure rotation in a planar motion with accurate rotation angle is hardly realisable. To address this problem, at first, this thesis proposes a calibration method associated with the approximate circular motion. With these modified projection matrices, a resulting VH is represented by a hierarchical tree structure of voxels from which surfaces are extracted by the Marching cubes (MC) algorithm. However, the surfaces may have unexpected artefacts caused by a coarser volume reconstruction, the topological ambiguity of the MC algorithm, and imperfect image processing or calibration result. To avoid this sensitivity, this thesis proposes a robust surface construction algorithm which initially classifies local convex regions from imperfect MC vertices and then aggregates local surfaces constructed by the 3D convex hull algorithm. Furthermore, this thesis also explores the use of wide baseline images to refine a coarse VH using an affine invariant region descriptor. This improves the quality of VH when a small number of initial views is given. In conclusion, the proposed methods achieve a 3D model with enhanced accuracy. Also, robust surface modelling is retained when silhouette images are degraded by practical noise

    Toward Automated Aerial Refueling: Relative Navigation with Structure from Motion

    Get PDF
    The USAF\u27s use of UAS has expanded from reconnaissance to hunter/killer missions. As the UAS mission further expands into aerial combat, better performance and larger payloads will have a negative correlation with range and loiter times. Additionally, the Air Force Future Operating Concept calls for \formations of uninhabited refueling aircraft...[that] enable refueling operations partway inside threat areas. However, a lack of accurate relative positioning information prevents the ability to safely maintain close formation flight and contact between a tanker and a UAS. The inclusion of cutting edge vision systems on present refueling platforms may provide the information necessary to support a AAR mission by estimating the position of a trailing aircraft to provide inputs to a UAS controller capable of maintaining a given position. This research examines the ability of SfM to generate relative navigation information. Previous AAR research efforts involved the use of differential GPS, LiDAR, and vision systems. This research aims to leverage current and future imaging technology to compliment these solutions. The algorithm used in this thesis generates a point cloud by determining 3D structure from a sequence of 2D images. The algorithm then utilizes PCA to register the point cloud to a reference model. The algorithm was tested in a real world environment using a 1:7 scale F-15 model. Additionally, this thesis studies common 3D rigid registration algorithms in an effort characterize their performance in the AAR domain. Three algorithms are tested for runtime and registration accuracy with four data sets

    Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework

    Get PDF
    The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications

    Robust surface modelling of visual hull from multiple silhouettes

    Get PDF
    Reconstructing depth information from images is one of the actively researched themes in computer vision and its application involves most vision research areas from object recognition to realistic visualisation. Amongst other useful vision-based reconstruction techniques, this thesis extensively investigates the visual hull (VH) concept for volume approximation and its robust surface modelling when various views of an object are available. Assuming that multiple images are captured from a circular motion, projection matrices are generally parameterised in terms of a rotation angle from a reference position in order to facilitate the multi-camera calibration. However, this assumption is often violated in practice, i.e., a pure rotation in a planar motion with accurate rotation angle is hardly realisable. To address this problem, at first, this thesis proposes a calibration method associated with the approximate circular motion. With these modified projection matrices, a resulting VH is represented by a hierarchical tree structure of voxels from which surfaces are extracted by the Marching cubes (MC) algorithm. However, the surfaces may have unexpected artefacts caused by a coarser volume reconstruction, the topological ambiguity of the MC algorithm, and imperfect image processing or calibration result. To avoid this sensitivity, this thesis proposes a robust surface construction algorithm which initially classifies local convex regions from imperfect MC vertices and then aggregates local surfaces constructed by the 3D convex hull algorithm. Furthermore, this thesis also explores the use of wide baseline images to refine a coarse VH using an affine invariant region descriptor. This improves the quality of VH when a small number of initial views is given. In conclusion, the proposed methods achieve a 3D model with enhanced accuracy. Also, robust surface modelling is retained when silhouette images are degraded by practical noise

    Multi-Surface Simplex Spine Segmentation for Spine Surgery Simulation and Planning

    Get PDF
    This research proposes to develop a knowledge-based multi-surface simplex deformable model for segmentation of healthy as well as pathological lumbar spine data. It aims to provide a more accurate and robust segmentation scheme for identification of intervertebral disc pathologies to assist with spine surgery planning. A robust technique that combines multi-surface and shape statistics-aware variants of the deformable simplex model is presented. Statistical shape variation within the dataset has been captured by application of principal component analysis and incorporated during the segmentation process to refine results. In the case where shape statistics hinder detection of the pathological region, user-assistance is allowed to disable the prior shape influence during deformation. Results have been validated against user-assisted expert segmentation

    Tensor Representations for Object Classification and Detection

    Get PDF
    A key problem in object recognition is finding a suitable object representation. For historical and computational reasons, vector descriptions that encode particular statistical properties of the data have been broadly applied. However, employing tensor representation can describe the interactions of multiple factors inherent to image formation. One of the most convenient uses for tensors is to represent complex objects in order to build a discriminative description. Thus thesis has several main contributions, focusing on visual data detection (e.g. of heads or pedestrians) and classification (e.g. of head or human body orientation) in still images and on machine learning techniques to analyse tensor data. These applications are among the most studied in computer vision and are typically formulated as binary or multi-class classification problems. The applicative context of this thesis is the video surveillance, where classification and detection tasks can be very hard, due to the scarce resolution and the noise characterising sensor data. Therefore, the main goal in that context is to design algorithms that can characterise different objects of interest, especially when immersed in a cluttered background and captured at low resolution. In the different amount of machine learning approaches, the ensemble-of-classifiers demonstrated to reach excellent classification accuracy, good generalisation ability, and robustness of noisy data. For these reasons, some approaches in that class have been adopted as basic machine classification frameworks to build robust classifiers and detectors. Moreover, also kernel machines has been exploited for classification purposes, since they represent a natural learning framework for tensors

    Object Tracking and Mensuration in Surveillance Videos

    Get PDF
    This thesis focuses on tracking and mensuration in surveillance videos. The first part of the thesis discusses several object tracking approaches based on the different properties of tracking targets. For airborne videos, where the targets are usually small and with low resolutions, an approach of building motion models for foreground/background proposed in which the foreground target is simplified as a rigid object. For relatively high resolution targets, the non-rigid models are applied. An active contour-based algorithm has been introduced. The algorithm is based on decomposing the tracking into three parts: estimate the affine transform parameters between successive frames using particle filters; detect the contour deformation using a probabilistic deformation map, and regulate the deformation by projecting the updated model onto a trained shape subspace. The active appearance Markov chain (AAMC). It integrates a statistical model of shape, appearance and motion. In the AAMC model, a Markov chain represents the switching of motion phases (poses), and several pairwise active appearance model (P-AAM) components characterize the shape, appearance and motion information for different motion phases. The second part of the thesis covers video mensuration, in which we have proposed a heightmeasuring algorithm with less human supervision, more flexibility and improved robustness. From videos acquired by an uncalibrated stationary camera, we first recover the vanishing line and the vertical point of the scene. We then apply a single view mensuration algorithm to each of the frames to obtain height measurements. Finally, using the LMedS as the cost function and the Robbins-Monro stochastic approximation (RMSA) technique to obtain the optimal estimate
    • …
    corecore