57,308 research outputs found
Cross-Modal Message Passing for Two-stream Fusion
Processing and fusing information among multi-modal is a very useful
technique for achieving high performance in many computer vision problems. In
order to tackle multi-modal information more effectively, we introduce a novel
framework for multi-modal fusion: Cross-modal Message Passing (CMMP).
Specifically, we propose a cross-modal message passing mechanism to fuse
two-stream network for action recognition, which composes of an appearance
modal network (RGB image) and a motion modal (optical flow image) network. The
objectives of individual networks in this framework are two-fold: a standard
classification objective and a competing objective. The classification object
ensures that each modal network predicts the true action category while the
competing objective encourages each modal network to outperform the other one.
We quantitatively show that the proposed CMMP fuses the traditional two-stream
network more effectively, and outperforms all existing two-stream fusion method
on UCF-101 and HMDB-51 datasets.Comment: 2018 IEEE International Conference on Acoustics, Speech and Signal
Processin
Robust Deep Multi-Modal Sensor Fusion using Fusion Weight Regularization and Target Learning
Sensor fusion has wide applications in many domains including health care and
autonomous systems. While the advent of deep learning has enabled promising
multi-modal fusion of high-level features and end-to-end sensor fusion
solutions, existing deep learning based sensor fusion techniques including deep
gating architectures are not always resilient, leading to the issue of fusion
weight inconsistency. We propose deep multi-modal sensor fusion architectures
with enhanced robustness particularly under the presence of sensor failures. At
the core of our gating architectures are fusion weight regularization and
fusion target learning operating on auxiliary unimodal sensing networks
appended to the main fusion model. The proposed regularized gating
architectures outperform the existing deep learning architectures with and
without gating under both clean and corrupted sensory inputs resulted from
sensor failures. The demonstrated improvements are particularly pronounced when
one or more multiple sensory modalities are corrupted.Comment: 8 page
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
We abstract the features (i.e. learned representations) of multi-modal data
into 1) uni-modal features, which can be learned from uni-modal training, and
2) paired features, which can only be learned from cross-modal interactions.
Multi-modal models are expected to benefit from cross-modal interactions on the
basis of ensuring uni-modal feature learning. However, recent supervised
multi-modal late-fusion training approaches still suffer from insufficient
learning of uni-modal features on each modality. We prove that this phenomenon
does hurt the model's generalization ability. To this end, we propose to choose
a targeted late-fusion learning method for the given supervised multi-modal
task from Uni-Modal Ensemble(UME) and the proposed Uni-Modal Teacher(UMT),
according to the distribution of uni-modal and paired features. We demonstrate
that, under a simple guiding strategy, we can achieve comparable results to
other complex late-fusion or intermediate-fusion methods on various multi-modal
datasets, including VGG-Sound, Kinetics-400, UCF101, and ModelNet40
Robust multi-modal and multi-unit feature level fusion of face and iris biometrics
Multi-biometrics has recently emerged as a mean of more robust and effcient
personal verification and identification. Exploiting information from multiple
sources at various levels i.e., feature, score, rank or decision, the false acceptance
and rejection rates can be considerably reduced. Among all, feature level fusion
is relatively an understudied problem. This paper addresses the feature level
fusion for multi-modal and multi-unit sources of information. For multi-modal
fusion the face and iris biometric traits are considered, while the multi-unit fusion
is applied to merge the data from the left and right iris images. The proposed
approach computes the SIFT features from both biometric sources, either multi-
modal or multi-unit. For each source, the extracted SIFT features are selected via
spatial sampling. Then these selected features are finally concatenated together
into a single feature super-vector using serial fusion. This concatenated feature
vector is used to perform classification.
Experimental results from face and iris standard biometric databases are
presented. The reported results clearly show the performance improvements in
classification obtained by applying feature level fusion for both multi-modal and
multi-unit biometrics in comparison to uni-modal classification and score level
fusion
Multi-modal Embedding Fusion-based Recommender
Recommendation systems have lately been popularized globally, with primary
use cases in online interaction systems, with significant focus on e-commerce
platforms. We have developed a machine learning-based recommendation platform,
which can be easily applied to almost any items and/or actions domain. Contrary
to existing recommendation systems, our platform supports multiple types of
interaction data with multiple modalities of metadata natively. This is
achieved through multi-modal fusion of various data representations. We
deployed the platform into multiple e-commerce stores of different kinds, e.g.
food and beverages, shoes, fashion items, telecom operators. Here, we present
our system, its flexibility and performance. We also show benchmark results on
open datasets, that significantly outperform state-of-the-art prior work.Comment: 7 pages, 8 figure
Fuzzy Interval-Valued Multi Criteria Based Decision Making for Ranking Features in Multi-Modal 3D Face Recognition
Soodamani Ramalingam, 'Fuzzy interval-valued multi criteria based decision making for ranking features in multi-modal 3D face recognition', Fuzzy Sets and Systems, In Press version available online 13 June 2017. This is an Open Access paper, made available under the Creative Commons license CC BY 4.0 https://creativecommons.org/licenses/by/4.0/This paper describes an application of multi-criteria decision making (MCDM) for multi-modal fusion of features in a 3D face recognition system. A decision making process is outlined that is based on the performance of multi-modal features in a face recognition task involving a set of 3D face databases. In particular, the fuzzy interval valued MCDM technique called TOPSIS is applied for ranking and deciding on the best choice of multi-modal features at the decision stage. It provides a formal mechanism of benchmarking their performances against a set of criteria. The technique demonstrates its ability in scaling up the multi-modal features.Peer reviewedProo
- …