459 research outputs found
Image Representations and New Domains in Neural Image Captioning
We examine the possibility that recent promising results in automatic caption
generation are due primarily to language models. By varying image
representation quality produced by a convolutional neural network, we find that
a state-of-the-art neural captioning algorithm is able to produce quality
captions even when provided with surprisingly poor image representations. We
replicate this result in a new, fine-grained, transfer learned captioning
domain, consisting of 66K recipe image/title pairs. We also provide some
experiments regarding the appropriateness of datasets for automatic captioning,
and find that having multiple captions per image is beneficial, but not an
absolute requirement.Comment: 11 Pages, 5 Images, To appear at EMNLP 2015's Vision + Learning
worksho
Distinctive-attribute Extraction for Image Captioning
Image captioning, an open research issue, has been evolved with the progress
of deep neural networks. Convolutional neural networks (CNNs) and recurrent
neural networks (RNNs) are employed to compute image features and generate
natural language descriptions in the research. In previous works, a caption
involving semantic description can be generated by applying additional
information into the RNNs. In this approach, we propose a distinctive-attribute
extraction (DaE) which explicitly encourages significant meanings to generate
an accurate caption describing the overall meaning of the image with their
unique situation. Specifically, the captions of training images are analyzed by
term frequency-inverse document frequency (TF-IDF), and the analyzed semantic
information is trained to extract distinctive-attributes for inferring
captions. The proposed scheme is evaluated on a challenge data, and it improves
an objective performance while describing images in more detail.Comment: 14 main pages, 4 supplementary page
- …