22,290 research outputs found
Deep Multimodal Learning for Audio-Visual Speech Recognition
In this paper, we present methods in deep multimodal learning for fusing
speech and visual modalities for Audio-Visual Automatic Speech Recognition
(AV-ASR). First, we study an approach where uni-modal deep networks are trained
separately and their final hidden layers fused to obtain a joint feature space
in which another deep network is built. While the audio network alone achieves
a phone error rate (PER) of under clean condition on the IBM large
vocabulary audio-visual studio dataset, this fusion model achieves a PER of
demonstrating the tremendous value of the visual channel in phone
classification even in audio with high signal to noise ratio. Second, we
present a new deep network architecture that uses a bilinear softmax layer to
account for class specific correlations between modalities. We show that
combining the posteriors from the bilinear networks with those from the fused
model mentioned above results in a further significant phone error rate
reduction, yielding a final PER of .Comment: ICASSP 201
A Novel Approach to Face Recognition using Image Segmentation based on SPCA-KNN Method
In this paper we propose a novel method for face recognition using hybrid SPCA-KNN (SIFT-PCA-KNN) approach. The proposed method consists of three parts. The first part is based on preprocessing face images using Graph Based algorithm and SIFT (Scale Invariant Feature Transform) descriptor. Graph Based topology is used for matching two face images. In the second part eigen values and eigen vectors are extracted from each input face images. The goal is to extract the important information from the face data, to represent it as a set of new orthogonal variables called principal components. In the final part a nearest neighbor classifier is designed for classifying the face images based on the SPCA-KNN algorithm. The algorithm has been tested on 100 different subjects (15 images for each class). The experimental result shows that the proposed method has a positive effect on overall face recognition performance and outperforms other examined methods
Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis
Pancreatic cancer has the poorest prognosis among all cancer types.
Intraductal Papillary Mucinous Neoplasms (IPMNs) are radiographically
identifiable precursors to pancreatic cancer; hence, early detection and
precise risk assessment of IPMN are vital. In this work, we propose a
Convolutional Neural Network (CNN) based computer aided diagnosis (CAD) system
to perform IPMN diagnosis and risk assessment by utilizing multi-modal MRI. In
our proposed approach, we use minimum and maximum intensity projections to ease
the annotation variations among different slices and type of MRIs. Then, we
present a CNN to obtain deep feature representation corresponding to each MRI
modality (T1-weighted and T2-weighted). At the final step, we employ canonical
correlation analysis (CCA) to perform a fusion operation at the feature level,
leading to discriminative canonical correlation features. Extracted features
are used for classification. Our results indicate significant improvements over
other potential approaches to solve this important problem. The proposed
approach doesn't require explicit sample balancing in cases of imbalance
between positive and negative examples. To the best of our knowledge, our study
is the first to automatically diagnose IPMN using multi-modal MRI.Comment: Accepted for publication in IEEE International Symposium on
Biomedical Imaging (ISBI) 201
Common and Distinct Components in Data Fusion
In many areas of science multiple sets of data are collected pertaining to
the same system. Examples are food products which are characterized by
different sets of variables, bio-processes which are on-line sampled with
different instruments, or biological systems of which different genomics
measurements are obtained. Data fusion is concerned with analyzing such sets of
data simultaneously to arrive at a global view of the system under study. One
of the upcoming areas of data fusion is exploring whether the data sets have
something in common or not. This gives insight into common and distinct
variation in each data set, thereby facilitating understanding the
relationships between the data sets. Unfortunately, research on methods to
distinguish common and distinct components is fragmented, both in terminology
as well as in methods: there is no common ground which hampers comparing
methods and understanding their relative merits. This paper provides a unifying
framework for this subfield of data fusion by using rigorous arguments from
linear algebra. The most frequently used methods for distinguishing common and
distinct components are explained in this framework and some practical examples
are given of these methods in the areas of (medical) biology and food science.Comment: 50 pages, 12 figure
- …