Search CORE

4 research outputs found

Generating Diverse and Meaningful Captions

Author: A Karpathy
I Goodfellow
O Russakovsky
O Vinyals
P Anderson
R Bernardi
S Hochreiter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.Comment: Accepted for presentation at The 27th International Conference on Artificial Neural Networks (ICANN 2018

arXiv.org e-Print Archive

Crossref

Arrow@TUDublin

Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning

Author: Kelleher John D.
Lindh Annika
Mahalunkar Abhijit
Ross Robert
Salton Giancarlo
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2018
Field of study

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online: https://github.com/AnnikaLindh/Diverse_and_Specific_Image_Captionin

Arrow@TUDublin

Entity-Grounded Image Captioning

Author: Kelleher John D.
Lindh Annika
Ross Robert
Publication venue: Dublin Institute of Technology
Publication date: 01/09/2018
Field of study

An urgent limitation in current Image Captioning models is their tendency to produce generic captions that avoid the interesting detail which makes each image unique. To address this limitation, we propose an approach that enforces a stronger alignment between image regions and specific segments of text. The model architecture is composed of a visual region proposer, a region-order planner and a region-guided caption generator. The region-guided caption generator incorporates a novel information gate which allows visual and textual input of different frequencies and dimensionalities in a Recurrent Neural Network

Arrow@TUDublin