Search CORE

865 research outputs found

COMIC: Towards A Compact Image Captioning Model with Attention

Author: Chan Chee Seng
Chuah Joon Huang
Tan Jia Huei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that, our proposed model, named COMIC for COMpact Image Captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedding vocabulary size that is 39x - 99x smaller. The source code and models are available at: https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-AttentionComment: Added source code link and new results in Table

arXiv.org e-Print Archive

UM Digital Repository

Twin Networks: Matching the Future for Sequence Generation

Author: Bengio Yoshua
Ke Nan Rosemary
Pal Chris
Serdyuk Dmitriy
Sordoni Alessandro
Trischler Adam
Publication venue
Publication date: 01/01/2018
Field of study

We propose a simple technique for encouraging generative RNNs to plan ahead. We train a "backward" recurrent network to generate a given sequence in reverse order, and we encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. We hypothesize that our approach eases modeling of long-term dependencies by implicitly forcing the forward states to hold information about the longer-term future (as contained in the backward states). We show empirically that our approach achieves 9% relative improvement for a speech recognition task, and achieves significant improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201

arXiv.org e-Print Archive

PolyPublie