Search CORE

6 research outputs found

CentralNet: a Multilayer Approach for Multimodal Fusion

Author: A Dhall
D Lahat
M Kang
N Neverova
N Neverova
PK Atrey
S Chandar
S Escalera
Y LeCun
Z Gu
Publication venue
Publication date: 22/08/2018
Field of study

This paper proposes a novel multimodal fusion approach, aiming to produce best possible decisions by integrating information coming from multiple media. While most of the past multimodal approaches either work by projecting the features of different modalities into the same space, or by coordinating the representations of each modality through the use of constraints, our approach borrows from both visions. More specifically, assuming each modality can be processed by a separated deep convolutional network, allowing to take decisions independently from each modality, we introduce a central network linking the modality specific networks. This central network not only provides a common feature embedding but also regularizes the modality specific networks through the use of multi-task learning. The proposed approach is validated on 4 different computer vision tasks on which it consistently improves the accuracy of existing multimodal fusion approaches

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

TCGM: An Information-Theoretic Framework for Semi-Supervised Multi-Modality Learning

Author: A Achille
B Leskes
C Busso
DD Lewis
J Ashburner
M Studenỳ
S Chandar
SL Huang
T Miyato
X Nguyen
Y Cheng
Publication venue
Publication date: 13/07/2020
Field of study

Fusing data from multiple modalities provides more information to train machine learning systems. However, it is prohibitively expensive and time-consuming to label each modality with a large amount of data, which leads to a crucial problem of semi-supervised multi-modal learning. Existing methods suffer from either ineffective fusion across modalities or lack of theoretical guarantees under proper assumptions. In this paper, we propose a novel information-theoretic approach, namely \textbf{T}otal \textbf{C}orrelation \textbf{G}ain \textbf{M}aximization (TCGM), for semi-supervised multi-modal learning, which is endowed with promising properties: (i) it can utilize effectively the information across different modalities of unlabeled data points to facilitate training classifiers of each modality (ii) it has theoretical guarantee to identify Bayesian classifiers, i.e., the ground truth posteriors of all modalities. Specifically, by maximizing TC-induced loss (namely TC gain) over classifiers of all modalities, these classifiers can cooperatively discover the equivalent class of ground-truth classifiers; and identify the unique ones by leveraging limited percentage of labeled data. We apply our method to various tasks and achieve state-of-the-art results, including news classification, emotion recognition and disease prediction.Comment: ECCV 2020 (oral

arXiv.org e-Print Archive

Crossref

Open Data Sets in Human Activity Recognition Research - Issues and Challenges: A Review

Author: Alam Gulzar
McChesney Ian
Nicholl Peter
Rafferty Joseph
Publication venue
Publication date: 04/10/2023
Field of study

Ulster University's Research Portal