Robust Latent Representations via Cross-Modal Translation and Alignment

Brutti, Alessio; Cavallaro, Andrea; Rajan, Vandana

Robust Latent Representations via Cross-Modal Translation and Alignment

Authors: Alessio Brutti
Andrea Cavallaro
Vandana Rajan
Publication date: 1 January 2021
Publisher
Doi

Abstract

Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities. The translation from the weaker to the stronger modality generates a multi-modal intermediate encoding that is representative of both modalities. This encoding is then correlated with the stronger modality representations in a shared latent space. We validate the proposed approach on the AVEC 2016 dataset for continuous emotion recognition and show the effectiveness of the approach that achieves state-of-the-art (uni-modal) performance for weaker modalities

Similar works

Full text

Available Versions

Archivio della ricerca - Fondazione Bruno Kessler

oai:cris.fbk.eu:11582/326867

Last time updated on 11/07/2022