2,868 research outputs found
Image Representations and New Domains in Neural Image Captioning
We examine the possibility that recent promising results in automatic caption
generation are due primarily to language models. By varying image
representation quality produced by a convolutional neural network, we find that
a state-of-the-art neural captioning algorithm is able to produce quality
captions even when provided with surprisingly poor image representations. We
replicate this result in a new, fine-grained, transfer learned captioning
domain, consisting of 66K recipe image/title pairs. We also provide some
experiments regarding the appropriateness of datasets for automatic captioning,
and find that having multiple captions per image is beneficial, but not an
absolute requirement.Comment: 11 Pages, 5 Images, To appear at EMNLP 2015's Vision + Learning
worksho
Domain Adaptation for Neural Networks by Parameter Augmentation
We propose a simple domain adaptation method for neural networks in a
supervised setting. Supervised domain adaptation is a way of improving the
generalization performance on the target domain by using the source domain
dataset, assuming that both of the datasets are labeled. Recently, recurrent
neural networks have been shown to be successful on a variety of NLP tasks such
as caption generation; however, the existing domain adaptation techniques are
limited to (1) tune the model parameters by the target dataset after the
training by the source dataset, or (2) design the network to have dual output,
one for the source domain and the other for the target domain. Reformulating
the idea of the domain adaptation technique proposed by Daume (2007), we
propose a simple domain adaptation method, which can be applied to neural
networks trained with a cross-entropy loss. On captioning datasets, we show
performance improvements over other domain adaptation methods.Comment: 9 page. To appear in the first ACL Workshop on Representation
Learning for NL
What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?
In neural image captioning systems, a recurrent neural network (RNN) is
typically viewed as the primary `generation' component. This view suggests that
the image features should be `injected' into the RNN. This is in fact the
dominant view in the literature. Alternatively, the RNN can instead be viewed
as only encoding the previously generated words. This view suggests that the
RNN should only be used to encode linguistic features and that only the final
representation should be `merged' with the image features at a later stage.
This paper compares these two architectures. We find that, in general, late
merging outperforms injection, suggesting that RNNs are better viewed as
encoders, rather than generators.Comment: Appears in: Proceedings of the 10th International Conference on
Natural Language Generation (INLG'17
- …