Jointly Modeling Embedding and Translation to Bridge Video and Language

Houqiang Li; Tao Mei; Ting Yao; Yingwei Pan; Yong Rui; †

research

Jointly Modeling Embedding and Translation to Bridge Video and Language

Authors: Houqiang Li
Tao Mei
Ting Yao
Yingwei Pan
Yong Rui
†
Publication date: 4 June 2015
Publisher
Doi

Abstract

Automatically describing video content with natural language is a fundamental challenge of multimedia. Recurrent Neural Networks (RNN), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best reported performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. We also demonstrate that LSTM-E is superior in predicting Subject-Verb-Object (SVO) triplets to several state-of-the-art techniques

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.1055....

Last time updated on 07/12/2020

Crossref

info:doi/10.1109%2Fcvpr.2016.4...

Last time updated on 05/08/2021

CiteSeerX

oai:CiteSeerX.psu:10.1.1.1055....

Last time updated on 07/12/2020