221 research outputs found
Multimodal headpose estimation and applications
This thesis presents new research into human headpose estimation and its applications
in multi-modal data. We develop new methods for head pose estimation
spanning RGB-D Human Computer Interaction (HCI) to far away "in the wild"
surveillance quality data. We present the state-of-the-art solution in both head
detection and head pose estimation through a new end-to-end Convolutional Neural
Network architecture that reuses all of the computation for detection and pose
estimation. In contrast to prior work, our method successfully spans close up HCI
to low-resolution surveillance data and is cross modality: operating on both RGB
and RGB-D data. We further address the problem of limited amount of standard
data, and different quality of annotations by semi supervised learning and novel
data augmentation. (This latter contribution also finds application in the domain
of life sciences.)
We report the highest accuracy by a large margin: 60% improvement; and demonstrate
leading performance on multiple standardized datasets. In HCI we reduce
the angular error by 40% relative to the previous reported literature. Furthermore,
by defining a probabilistic spatial gaze model from the head pose we show
application in human-human, human-scene interaction understanding. We present
the state-of-the art results on the standard interaction datasets. A new metric to
model "social mimicry" through the temporal correlation of the headpose signal
is contributed and shown to be valid qualitatively and intuitively. As an application
in surveillance, it is shown that with the robust headpose signal as a prior,
state-of-the-art results in tracking under occlusion using a Kalman filter can be
achieved. This model is named the Intentional Tracker and it improves visual
tracking metrics by up to 15%.
We also apply the ALICE loss that was developed for the end-to-end detection
and classification, to dense classiffication of underwater coral reefs imagery. The
objective of this work is to solve the challenging task of recognizing and segmenting
underwater coral imagery in the wild with sparse point-based ground truth
labelling. To achieve this, we propose an integrated Fully Convolutional Neural
Network (FCNN) and Fully-Connected Conditional Random Field (CRF) based classification and segmentation algorithm. Our major contributions lie in four major
areas. First, we show that multi-scale crop based training is useful in learning
of the initial weights in the canonical one class classiffication problem. Second,
we propose a modified ALICE loss for training the FCNN on sparse labels with
class imbalance and establish its signi cance empirically. Third we show that
by arti cially enhancing the point labels to small regions based on class distance
transform, we can improve the classification accuracy further. Fourth, we improve
the segmentation results using fully connected CRFs by using a bilateral message
passing prior. We improve upon state-of-the-art results on all publicly available
datasets by a significant margin
Optimal path planning for detection and classification of underwater targets using sonar
2021 Spring.Includes bibliographical references.The work presented in this dissertation focuses on choosing an optimal path for performing sequential detection and classification state estimation to identify potential underwater targets using sonar imagery. The detection state estimation falls under the occupancy grid framework, modeling the relationship between occupancy state of grid cells and sensor measurements, and allows for the consideration of statistical dependence between the occupancy state of each grid cell in the map. This is in direct contrast to the classical formulations of occupancy grid frameworks, in which the occupancy state of each grid cell is considered statistically independent. The new method provides more accurate estimates, and occupancy grids estimated with this method typically converge with fewer measurements. The classification state estimation utilises a Dirichlet-Categorical model and a one-step classifier to perform efficient updating of the classification state estimate for each grid cell. To show the performance capabilities of the developed sequential state estimation methods, they are applied to sonar systems in littoral areas in which targets lay on the seafloor, could be proud, partially or fully buried. Additionally, a new approach to the active perception problem, which seeks to select a series of sensing actions that provide the maximal amount of information to the system, is developed. This new approach leverages the aforementioned sequential state estimation techniques to develop a set of information-theoretic cost functions that can be used for optimal sensing action selection. A path planning cost function is developed, defined as the mutual information between the aforementioned state variables before and after a measurement. The cost function is expressed in closed form by considering the prior and posterior distributions of the state variables. Choice of the optimal sensing actions is performed by modeling the path planning as a Markov decision problem, and solving it with the rollout algorithm. This work, supported by the Office of Naval Research (ONR), is intended to develop a suite of interactive sensing algorithms to autonomously command an autonomous underwater vehicle (AUV) for the task of detection and classification of underwater mines, while choosing an optimal navigation route that increases the quality of the detection and classification state estimates
Balance-guaranteed optimized tree with reject option for live fish recognition
This thesis investigates the computer vision application of live fish recognition, which
is needed in application scenarios where manual annotation is too expensive, when
there are too many underwater videos. This system can assist ecological surveillance
research, e.g. computing fish population statistics in the open sea. Some pre-processing
procedures are employed to improve the recognition accuracy, and then 69 types of
features are extracted. These features are a combination of colour, shape and texture
properties in different parts of the fish such as tail/head/top/bottom, as well as
the whole fish. Then, we present a novel Balance-Guaranteed Optimized Tree with
Reject option (BGOTR) for live fish recognition. It improves the normal hierarchical
method by arranging more accurate classifications at a higher level and keeping the
hierarchical tree balanced. BGOTR is automatically constructed based on inter-class
similarities. We apply a Gaussian Mixture Model (GMM) and Bayes rule as a reject
option after the hierarchical classification to evaluate the posterior probability of being
a certain species to filter less confident decisions. This novel classification-rejection
method cleans up decisions and rejects unknown classes. After constructing the tree
architecture, a novel trajectory voting method is used to eliminate accumulated errors
during hierarchical classification and, therefore, achieves better performance. The proposed
BGOTR-based hierarchical classification method is applied to recognize the 15
major species of 24150 manually labelled fish images and to detect new species in
an unrestricted natural environment recorded by underwater cameras in south Taiwan
sea. It achieves significant improvements compared to the state-of-the-art techniques.
Furthermore, the sequence of feature selection and constructing a multi-class SVM
is investigated. We propose that an Individual Feature Selection (IFS) procedure can
be directly exploited to the binary One-versus-One SVMs before assembling the full
multiclass SVM. The IFS method selects different subsets of features for each Oneversus-
One SVM inside the multiclass classifier so that each vote is optimized to discriminate
the two specific classes. The proposed IFS method is tested on four different
datasets comparing the performance and time cost. Experimental results demonstrate
significant improvements compared to the normal Multiclass Feature Selection (MFS)
method on all datasets
Mapping of complex marine environments using an unmanned surface craft
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 185-199).Recent technology has combined accurate GPS localization with mapping to build 3D maps in a diverse range of terrestrial environments, but the mapping of marine environments lags behind. This is particularly true in shallow water and coastal areas with man-made structures such as bridges, piers, and marinas, which can pose formidable challenges to autonomous underwater vehicle (AUV) operations. In this thesis, we propose a new approach for mapping shallow water marine environments, combining data from both above and below the water in a robust probabilistic state estimation framework. The ability to rapidly acquire detailed maps of these environments would have many applications, including surveillance, environmental monitoring, forensic search, and disaster recovery. Whereas most recent AUV mapping research has been limited to open waters, far from man-made surface structures, in our work we focus on complex shallow water environments, such as rivers and harbors, where man-made structures block GPS signals and pose hazards to navigation. Our goal is to enable an autonomous surface craft to combine data from the heterogeneous environments above and below the water surface - as if the water were drained, and we had a complete integrated model of the marine environment, with full visibility. To tackle this problem, we propose a new framework for 3D SLAM in marine environments that combines data obtained concurrently from above and below the water in a robust probabilistic state estimation framework. Our work makes systems, algorithmic, and experimental contributions in perceptual robotics for the marine environment. We have created a novel Autonomous Surface Vehicle (ASV), equipped with substantial onboard computation and an extensive sensor suite that includes three SICK lidars, a Blueview MB2250 imaging sonar, a Doppler Velocity Log, and an integrated global positioning system/inertial measurement unit (GPS/IMU) device. The data from these sensors is processed in a hybrid metric/topological SLAM state estimation framework. A key challenge to mapping is extracting effective constraints from 3D lidar data despite GPS loss and reacquisition. This was achieved by developing a GPS trust engine that uses a semi-supervised learning classifier to ascertain the validity of GPS information for different segments of the vehicle trajectory. This eliminates the troublesome effects of multipath on the vehicle trajectory estimate, and provides cues for submap decomposition. Localization from lidar point clouds is performed using octrees combined with Iterative Closest Point (ICP) matching, which provides constraints between submaps both within and across different mapping sessions. Submap positions are optimized via least squares optimization of the graph of constraints, to achieve global alignment. The global vehicle trajectory is used for subsea sonar bathymetric map generation and for mesh reconstruction from lidar data for 3D visualization of above-water structures. We present experimental results in the vicinity of several structures spanning or along the Charles River between Boston and Cambridge, MA. The Harvard and Longfellow Bridges, three sailing pavilions and a yacht club provide structures of interest, having both extensive superstructure and subsurface foundations. To quantitatively assess the mapping error, we compare against a georeferenced model of the Harvard Bridge using blueprints from the Library of Congress. Our results demonstrate the potential of this new approach to achieve robust and efficient model capture for complex shallow-water marine environments. Future work aims to incorporate autonomy for path planning of a region of interest while performing collision avoidance to enable fully autonomous surveys that achieve full sensor coverage of a complete marine environment.by Jacques Chadwick Leedekerken.Ph.D
Learning from small and imbalanced dataset of images using generative adversarial neural networks.
The performance of deep learning models is unmatched by any other approach in supervised computer vision tasks such as image classification. However, training these models requires a lot of labeled data, which are not always available. Labelling a massive dataset is largely a manual and very demanding process. Thus, this problem has led to the development of techniques that bypass the need for labelling at scale. Despite this, existing techniques such as transfer learning, data augmentation and semi-supervised learning have not lived up to expectations. Some of these techniques do not account for other classification challenges, such as a class-imbalance problem. Thus, these techniques mostly underperform when compared with fully supervised approaches. In this thesis, we propose new methods to train a deep model on image classification with a limited number of labeled examples. This was achieved by extending state-of-the-art generative adversarial networks with multiple fake classes and network switchers. These new features enabled us to train a classifier using large unlabeled data, while generating class specific samples. The proposed model is label agnostic and is suitable for different classification scenarios, ranging from weakly supervised to fully supervised settings. This was used to address classification challenges with limited labeled data and a class-imbalance problem. Extensive experiments were carried out on different benchmark datasets. Firstly, the proposed approach was used to train a classification model and our findings indicated that the proposed approach achieved better classification accuracies, especially when the number of labeled samples is small. Secondly, the proposed approach was able to generate high-quality samples from class-imbalance datasets. The samples' quality is evident in improved classification performances when generated samples were used in neutralising class-imbalance. The results are thoroughly analyzed and, overall, our method showed superior performances over popular resampling technique and the AC-GAN model. Finally, we successfully applied the proposed approach as a new augmentation technique to two challenging real-world problems: face with attributes and legacy engineering drawings. The results obtained demonstrate that the proposed approach is effective even in extreme cases
Machine Learning Approaches to Human Body Shape Analysis
Soft biometrics, biomedical sciences, and many other fields of study pay particular attention to the study of the geometric description of the human body, and its variations. Although multiple contributions, the interest is particularly high given the non-rigid nature of the human body, capable of assuming different poses, and numerous shapes due to variable body composition. Unfortunately, a well-known costly requirement in data-driven machine learning, and particularly in the human-based analysis, is the availability of data, in the form of geometric information (body measurements) with related vision information (natural images, 3D mesh, etc.). We introduce a computer graphics framework able to generate thousands of synthetic human body meshes, representing a population of individuals with stratified information: gender, Body Fat Percentage (BFP), anthropometric measurements, and pose. This contribution permits an extensive analysis of different bodies in different poses, avoiding the demanding, and expensive acquisition process. We design a virtual environment able to take advantage of the generated bodies, to infer the body surface area (BSA) from a single view. The framework permits to simulate the acquisition process of newly introduced RGB-D devices disentangling different noise components (sensor noise, optical distortion, body part occlusions). Common geometric descriptors in soft biometric, as well as in biomedical sciences, are based on body measurements. Unfortunately, as we prove, these descriptors are not pose invariant, constraining the usability in controlled scenarios. We introduce a differential geometry approach assuming body pose variations as isometric transformations of the body surface, and body composition changes covariant to the body surface area. This setting permits the use of the Laplace-Beltrami operator on the 2D body manifold, describing the body with a compact, efficient, and pose invariant representation. We design a neural network architecture able to infer important body semantics from spectral descriptors, closing the gap between abstract spectral features, and traditional measurement-based indices. Studying the manifold of body shapes, we propose an innovative generative adversarial model able to learn the body shapes. The method permits to generate new bodies with unseen geometries as a walk on the latent space, constituting a significant advantage over traditional generative methods
Proceedings of the 2009 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory
The joint workshop of the Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Karlsruhe, and the Vision and Fusion Laboratory (Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT)), is organized annually since 2005 with the aim to report on the latest research and development findings of the doctoral students of both institutions. This book provides a collection of 16 technical reports on the research results presented on the 2009 workshop
- …