865 research outputs found
COMIC: Towards A Compact Image Captioning Model with Attention
Recent works in image captioning have shown very promising raw performance.
However, we realize that most of these encoder-decoder style networks with
attention do not scale naturally to large vocabulary size, making them
difficult to be deployed on embedded system with limited hardware resources.
This is because the size of word and output embedding matrices grow
proportionally with the size of vocabulary, adversely affecting the compactness
of these networks. To address this limitation, this paper introduces a brand
new idea in the domain of image captioning. That is, we tackle the problem of
compactness of image captioning models which is hitherto unexplored. We showed
that, our proposed model, named COMIC for COMpact Image Captioning, achieves
comparable results in five common evaluation metrics with state-of-the-art
approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an
embedding vocabulary size that is 39x - 99x smaller. The source code and models
are available at:
https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-AttentionComment: Added source code link and new results in Table
Twin Networks: Matching the Future for Sequence Generation
We propose a simple technique for encouraging generative RNNs to plan ahead.
We train a "backward" recurrent network to generate a given sequence in reverse
order, and we encourage states of the forward model to predict cotemporal
states of the backward model. The backward network is used only during
training, and plays no role during sampling or inference. We hypothesize that
our approach eases modeling of long-term dependencies by implicitly forcing the
forward states to hold information about the longer-term future (as contained
in the backward states). We show empirically that our approach achieves 9%
relative improvement for a speech recognition task, and achieves significant
improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201
- …