16,467 research outputs found
Generative Image Modeling Using Spatial LSTMs
Modeling the distribution of natural images is challenging, partly because of
strong statistical dependencies which can extend over hundreds of pixels.
Recurrent neural networks have been successful in capturing long-range
dependencies in a number of problems but only recently have found their way
into generative image models. We here introduce a recurrent image model based
on multi-dimensional long short-term memory units which are particularly suited
for image modeling due to their spatial structure. Our model scales to images
of arbitrary size and its likelihood is computationally tractable. We find that
it outperforms the state of the art in quantitative comparisons on several
image datasets and produces promising results when used for texture synthesis
and inpainting
Show and Tell: A Neural Image Caption Generator
Automatically describing the content of an image is a fundamental problem in
artificial intelligence that connects computer vision and natural language
processing. In this paper, we present a generative model based on a deep
recurrent architecture that combines recent advances in computer vision and
machine translation and that can be used to generate natural sentences
describing an image. The model is trained to maximize the likelihood of the
target description sentence given the training image. Experiments on several
datasets show the accuracy of the model and the fluency of the language it
learns solely from image descriptions. Our model is often quite accurate, which
we verify both qualitatively and quantitatively. For instance, while the
current state-of-the-art BLEU-1 score (the higher the better) on the Pascal
dataset is 25, our approach yields 59, to be compared to human performance
around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66,
and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we
achieve a BLEU-4 of 27.7, which is the current state-of-the-art
- …