Dublin City University participation in the VTT track at TRECVid 2017

Afli, Haithem; Arazo Sánchez, Eric; Cosgrove, Daniel; Du, Jinhua; Hu, Feiyan; McGuinness, Kevin; O'Connor, Noel E.; Smeaton, Alan F.; Zhou, Jiang

research

Dublin City University participation in the VTT track at TRECVid 2017

Authors: Haithem Afli
Eric Arazo Sánchez
Daniel Cosgrove
Jinhua Du
Feiyan Hu
Kevin McGuinness
Noel E. O'Connor
Alan F. Smeaton
Jiang Zhou
Publication date: 15 November 2017
Publisher

Abstract

Dublin City University participated in the video-to-text caption generation task in TRECVid and this paper describes the three approaches we took for our 4 submitted runs. The first approach is based on extracting regularly-spaced keyframes from a video, generating a text caption for each keyframe and then combining the keyframe captions into a single caption. The second approach is based on detecting image crops from those keyframes using saliency map to include as much of the attractive part of the image as possible, generating a caption for each crop in each keyframe, and combining the captions into one. The third approach is an end-to-end system, a true deep learning submission based on MS-COCO, an externally available set of training captions. The paper presents a description and the official results of each of the approaches

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

DCU Online Research Access Service

oai:doras.dcu.ie:22155

Last time updated on 08/01/2018

Name not available

oai:doras.dcu.ie:22155

Last time updated on 09/02/2018

Irish Universities

Last time updated on 13/02/2018