129,972 research outputs found
Image Representations and New Domains in Neural Image Captioning
We examine the possibility that recent promising results in automatic caption
generation are due primarily to language models. By varying image
representation quality produced by a convolutional neural network, we find that
a state-of-the-art neural captioning algorithm is able to produce quality
captions even when provided with surprisingly poor image representations. We
replicate this result in a new, fine-grained, transfer learned captioning
domain, consisting of 66K recipe image/title pairs. We also provide some
experiments regarding the appropriateness of datasets for automatic captioning,
and find that having multiple captions per image is beneficial, but not an
absolute requirement.Comment: 11 Pages, 5 Images, To appear at EMNLP 2015's Vision + Learning
worksho
Modelling Local Deep Convolutional Neural Network Features to Improve Fine-Grained Image Classification
We propose a local modelling approach using deep convolutional neural
networks (CNNs) for fine-grained image classification. Recently, deep CNNs
trained from large datasets have considerably improved the performance of
object recognition. However, to date there has been limited work using these
deep CNNs as local feature extractors. This partly stems from CNNs having
internal representations which are high dimensional, thereby making such
representations difficult to model using stochastic models. To overcome this
issue, we propose to reduce the dimensionality of one of the internal fully
connected layers, in conjunction with layer-restricted retraining to avoid
retraining the entire network. The distribution of low-dimensional features
obtained from the modified layer is then modelled using a Gaussian mixture
model. Comparative experiments show that considerable performance improvements
can be achieved on the challenging Fish and UEC FOOD-100 datasets.Comment: 5 pages, three figure
Love Thy Neighbors: Image Annotation by Exploiting Image Metadata
Some images that are difficult to recognize on their own may become more
clear in the context of a neighborhood of related images with similar
social-network metadata. We build on this intuition to improve multilabel image
annotation. Our model uses image metadata nonparametrically to generate
neighborhoods of related images using Jaccard similarities, then uses a deep
neural network to blend visual information from the image and its neighbors.
Prior work typically models image metadata parametrically, in contrast, our
nonparametric treatment allows our model to perform well even when the
vocabulary of metadata changes between training and testing. We perform
comprehensive experiments on the NUS-WIDE dataset, where we show that our model
outperforms state-of-the-art methods for multilabel image annotation even when
our model is forced to generalize to new types of metadata.Comment: Accepted to ICCV 201
Predicting Motivations of Actions by Leveraging Text
Understanding human actions is a key problem in computer vision. However,
recognizing actions is only the first step of understanding what a person is
doing. In this paper, we introduce the problem of predicting why a person has
performed an action in images. This problem has many applications in human
activity understanding, such as anticipating or explaining an action. To study
this problem, we introduce a new dataset of people performing actions annotated
with likely motivations. However, the information in an image alone may not be
sufficient to automatically solve this task. Since humans can rely on their
lifetime of experiences to infer motivation, we propose to give computer vision
systems access to some of these experiences by using recently developed natural
language models to mine knowledge stored in massive amounts of text. While we
are still far away from fully understanding motivation, our results suggest
that transferring knowledge from language into vision can help machines
understand why people in images might be performing an action.Comment: CVPR 201
- …