Diachronic cross-modal embeddings

Andrew Galen; Bamler Robert; Bengio Yoshua; David; He Kaiming; Herbrich Ralf; Huang Xin; Kim Gunhee; Lau Jey Han; Mikolov Tomas; Mithun Niluthpol Chowdhury; Uricchio Tiberio; Wang L.; Yao T.

Diachronic cross-modal embeddings

Authors: Andrew Galen
Bamler Robert
Bengio Yoshua
David
He Kaiming
Herbrich Ralf
Huang Xin
Kim Gunhee
Lau Jey Han
Mikolov Tomas
Mithun Niluthpol Chowdhury
Uricchio Tiberio
Wang L.
Yao T.
Publication date: 30 September 2019
Publisher: 'Association for Computing Machinery (ACM)'
Doi

Abstract

This work has been partially funded by the CMU Portugal research project GoLocal Ref. CMUP-ERI/TIC/0046/2014, by the H2020 ICT project COGNITUS with the grant agreement no 687605 and by the FCT project NOVA LINCS Ref. UID/CEC/04516/2019. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research.Understanding the semantic shifts of multimodal information is only possible with models that capture cross-modal interactions over time. Under this paradigm, a new embedding is needed that structures visual-textual interactions according to the temporal dimension, thus, preserving data's original temporal organisation. This paper introduces a novel diachronic cross-modal embedding (DCM), where cross-modal correlations are represented in embedding space, throughout the temporal dimension, preserving semantic similarity at each instant t. To achieve this, we trained a neural cross-modal architecture, under a novel ranking loss strategy, that for each multimodal instance, enforces neighbour instances' temporal alignment, through subspace structuring constraints based on a temporal alignment window. Experimental results show that our DCM embedding successfully organises instances over time. Quantitative experiments, confirm that DCM is able to preserve semantic cross-modal correlations at each instant t while also providing better alignment capabilities. Qualitative experiments unveil new ways to browse multimodal content and hint that multimodal understanding tasks can benefit from this new embedding.publishersversionpublishe

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Repositório da Universidade Nova de Lisboa

oai:run.unl.pt:10362/127745

Last time updated on 05/12/2021

Crossref

Last time updated on 10/08/2021