Search CORE

3 research outputs found

Semantic tuples for evaluation of image sentence generation

Author: Cordero-Rama Jose
Ellebracht Lily D.
Moreno-Noguer Francesc
Quattoni Ariadna
Ramisa Arnau
Swaroop Madhyastha Pranava
Publication venue: Association for Computational Linguistics
Publication date: 09/06/2016
Field of study

Trabajo presentado al Workshop on Vision and Language (VL’15), celebrado en Lisboa (Portugal) el 18 de semptiebre de 2015.The automatic generation of image captions has received considerable attention. The problem of evaluating caption generation systems, though, has not been that much explored. We propose a novel evaluation approach based on comparing the underlying visual semantics of the candidate and ground-truth captions. With this goal in mind we have defined a semantic representation for visually descriptive language and have augmented a subset of the Flickr-8K dataset with semantic annotations. Our evaluation metric (BAST) can be used not only to compare systems but also to do error analysis and get a better understanding of the type of mistakes a system does. To compute BAST we need to predict the semantic representation for the automatically generated captions. We use the Flickr-ST dataset to train classifiers that predict STs so that evaluation can be fully automated.This work was partly funded by the Spanish MINECO project RobInstruct TIN2014-58178-R and by the ERA-net CHISTERA project VISEN PCIN-2013-047.Peer Reviewe

Digital.CSIC

Semantic tuples for evaluation of image sentence generation

Author: Cordero Rama Jose Alejandro
Ellebracht Lily Delores
Moreno-Noguer Francesc
Quattoni Ariadna Julieta
Ramisa Ayats Arnau
Shantharam Madhyastha Pranava Swaroop
Publication venue
Publication date: 01/01/2015
Field of study

The automatic generation of image captions has received considerable attention. The problem of evaluating caption generation systems, though, has not been that much explored. We propose a novel evaluation approach based on comparing the underlying visual semantics of the candidate and ground-truth captions. With this goal in mind we have defined a semantic representation for visually descriptive language and have augmented a subset of the Flickr-8K dataset with semantic annotations. Our evaluation metric (BAST) can be used not only to compare systems but also to do error analysis and get a better understanding of the type of mistakes a system does. To compute BAST we need to predict the semantic representation for the automatically generated captions. We use the Flickr-ST dataset to train classifiers that predict STs so that evaluation can be fully automated.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Defining Visually Descriptive Language

Author: Arnau Ramisa
Josiah Wang
Robert Gaizauskas
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

In this paper, we introduce the notion of visually descriptive language (VDL) – in-tuitively a text segment whose truth can be confirmed by visual sense alone. VDL can be exploited in many vision-based tasks, e.g. image interpretation and story illus-tration. In contrast to previous work re-quiring pre-aligned texts and images, we propose a broader definition of VDL that extends to a much larger range of texts without associated images. We also dis-cuss possible VDL annotation tasks and make recommendations for difficult cases. Lastly, we demonstrate the viability of our definition via an annotation exercise across several text genres and analyse inter-annotator agreement. Results show reasonably high levels of agreement be-tween annotators can be reached.

CiteSeerX

Crossref