24,450 research outputs found
Hybrid 2D and 3D face verification
Face verification is a challenging pattern recognition problem. The face is a biometric that, we as humans, know can be recognised. However, the face is highly deformable and its appearance alters significantly when the pose, illumination or expression changes. These changes in appearance are most notable for texture images, or two-dimensional (2D) data. But the underlying structure of the face, or three dimensional
(3D) data, is not changed by pose or illumination variations.
Over the past five years methods have been investigated to combine 2D and
3D face data to improve the accuracy and robustness of face verification. Much of this research has examined the fusion of a 2D verification system and a 3D verification system, known as multi-modal classifier score fusion. These verification systems usually compare two feature vectors (two image representations), a and b, using distance or angular-based similarity measures. However, this does not provide the most complete description of the features being compared as the distances describe at best the covariance of the data, or the second order statistics (for instance Mahalanobis based measures).
A more complete description would be obtained by describing the distribution of the feature vectors. However, feature distribution modelling is rarely applied to face verification because a large number of observations is required to train the models. This amount of data is usually unavailable and so this research examines two methods for overcoming this data limitation:
1. the use of holistic difference vectors of the face, and
2. by dividing the 3D face into Free-Parts.
The permutations of the holistic difference vectors is formed so that more observations are obtained from a set of holistic features. On the other hand, by dividing the face into parts and considering each part separately many observations are obtained from each face image; this approach is referred to as the Free-Parts approach. The extra observations from both these techniques are used to perform holistic feature distribution modelling and Free-Parts feature distribution modelling respectively. It is shown that the feature distribution modelling of these features leads to an improved 3D face verification system and an effective 2D face verification system. Using these two feature distribution techniques classifier score fusion is then examined.
This thesis also examines methods for performing classifier fusion score fusion.
Classifier score fusion attempts to combine complementary information from multiple classifiers. This complementary information can be obtained in two ways: by using different algorithms (multi-algorithm fusion) to represent the same face data for instance the 2D face data or by capturing the face data with different sensors (multimodal fusion) for instance capturing 2D and 3D face data. Multi-algorithm fusion is approached as combining verification systems that use holistic features and local features (Free-Parts) and multi-modal fusion examines the combination of 2D and 3D face data using all of the investigated techniques.
The results of the fusion experiments show that multi-modal fusion leads to a consistent improvement in performance. This is attributed to the fact that the data being fused is collected by two different sensors, a camera and a laser scanner. In deriving the multi-algorithm and multi-modal algorithms a consistent framework for fusion was developed.
The consistent fusion framework, developed from the multi-algorithm and multimodal experiments, is used to combine multiple algorithms across multiple modalities. This fusion method, referred to as hybrid fusion, is shown to provide improved performance over either fusion system on its own. The experiments show that the final hybrid face verification system reduces the False Rejection Rate from 8:59% for the best 2D verification system and 4:48% for the best 3D verification system to 0:59% for the hybrid verification system; at a False Acceptance Rate of 0:1%
Multimodal Fusion Strategies for Outcome Prediction in Stroke
Data driven methods are increasingly being adopted in the medical domain for clinical predictive modeling. Prediction of stroke outcome using machine learning could provide a decision support system for physicians to assist them in patient-oriented diagnosis and treatment. While patient-specific clinical parameters play an important role in outcome prediction, a multimodal fusion approach that integrates neuroimaging with clinical data has the potential to improve accuracy. This paper addresses two research questions: (a) does multimodal fusion aid in the prediction of stroke outcome, and (b) what fusion strategy is more suitable for the task at hand. The baselines for our experimental work are two unimodal neural architectures: a 3D Convolutional Neural Network for processing neuroimaging data, and a Multilayer Perceptron for processing clinical data. Using these unimodal architectures as building blocks we propose two feature-level multimodal fusion strategies: 1) extracted features , where the unimodal architectures are trained separately and then fused, and 2) end-to-end, where the unimodal architectures are trained together. We show that integration of neuroimaging information with clinical metadata can potentially improve stroke outcome prediction. Additionally, experimental results indicate that the end-to-end fusion approach proves to be more robust
Design of a multimodal rendering system
This paper addresses the rendering of aligned regular multimodal
datasets. It presents a general framework of multimodal data fusion
that includes several data merging methods. We also analyze the
requirements of a rendering system able to provide these different
fusion methods. On the basis of these requirements, we propose a novel
design for a multimodal rendering system. The design has been
implemented and proved showing to be efficient and flexible.Postprint (published version
Rendering techniques for multimodal data
Many different direct volume rendering methods have been developed to visualize 3D scalar fields on uniform rectilinear grids. However, little work has been done on rendering simultaneously various properties of the same 3D region measured with different registration devices or at different instants of time. The demand for this type of visualization is rapidly increasing in scientific applications such as medicine in which the visual integration of multiple modalities allows a better comprehension of the anatomy and a perception of its relationships with activity. This paper presents different strategies of Direct Multimodal Volume Rendering (DMVR). It is restricted to voxel models with a known 3D rigid alignment transformation. The paper evaluates at which steps of the render-ing pipeline must the data fusion be realized in order to accomplish the desired visual integration and to provide fast re-renders when some fusion parameters are modified. In addition, it analyzes how existing monomodal visualization al-gorithms can be extended to multiple datasets and it compares their efficiency and their computational cost.Postprint (published version
Multimodal Polynomial Fusion for Detecting Driver Distraction
Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone.
Although there has been a considerable amount of research on modeling the
distracted behavior of drivers under various conditions, accurate automatic
detection using multiple modalities and especially the contribution of using
the speech modality to improve accuracy has received little attention. This
paper introduces a new multimodal dataset for distracted driving behavior and
discusses automatic distraction detection using features from three modalities:
facial expression, speech and car signals. Detailed multimodal feature analysis
shows that adding more modalities monotonically increases the predictive
accuracy of the model. Finally, a simple and effective multimodal fusion
technique using a polynomial fusion layer shows superior distraction detection
results compared to the baseline SVM and neural network models.Comment: INTERSPEECH 201
- …