Search CORE

459 research outputs found

Image Representations and New Domains in Neural Image Captioning

Author: Hessel Jack
Savva Nicolas
Wilber Michael J.
Publication venue
Publication date: 01/01/2015
Field of study

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-the-art neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result in a new, fine-grained, transfer learned captioning domain, consisting of 66K recipe image/title pairs. We also provide some experiments regarding the appropriateness of datasets for automatic captioning, and find that having multiple captions per image is beneficial, but not an absolute requirement.Comment: 11 Pages, 5 Images, To appear at EMNLP 2015's Vision + Learning worksho

arXiv.org e-Print Archive

Crossref

Distinctive-attribute Extraction for Image Captioning

Author: Cho Choongsang
Jung Hyedong
Kim Boeun
Lee Young Han
Publication venue
Publication date: 25/07/2018
Field of study

Image captioning, an open research issue, has been evolved with the progress of deep neural networks. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are employed to compute image features and generate natural language descriptions in the research. In previous works, a caption involving semantic description can be generated by applying additional information into the RNNs. In this approach, we propose a distinctive-attribute extraction (DaE) which explicitly encourages significant meanings to generate an accurate caption describing the overall meaning of the image with their unique situation. Specifically, the captions of training images are analyzed by term frequency-inverse document frequency (TF-IDF), and the analyzed semantic information is trained to extract distinctive-attributes for inferring captions. The proposed scheme is evaluated on a challenge data, and it improves an objective performance while describing images in more detail.Comment: 14 main pages, 4 supplementary page

arXiv.org e-Print Archive

Crossref