Search CORE

7 research outputs found

Image Captioning menurut Scientific Revolution Kuhn dan Popper

Author: Khodra Masayu Layla
Munir Rinaldi
Nursikuwagus Agus
Publication venue: 'Universitas Komputer Indonesia'
Publication date: 01/10/2020
Field of study

Image captioning is one area in artificial intelligence that elaborates between computer vision and natural language processing. The focus on this process is an architecture neural network that includes many layers to solve the identification object on the image and give the caption. This architecture has a task to display the caption from object detection on one image. This paper explains about the connection between scientific revolution and image captioning. We have conducted the methodology by Kuhn's scientific revolution and relate to Popper's philosophy of science. The result of this paper is that an image captioning is truly science because many improvements from many researchers to find an effective method on the deep learning process. On the philosophy of science, if the phenomena can be falsified, then an image captioning is the science

Open Journal - Universitas Komputer Indonesia

Non-Autoregressive Coarse-to-Fine Video Captioning

Author: Liu Fenglin
Yang Bang
Zhang Can
Zou Yuexian
Publication venue
Publication date: 24/03/2021
Field of study

It is encouraged to see that progress has been made to bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer generating generic descriptions due to the insufficient training of visual words (e.g., nouns and verbs) and inadequate decoding paradigm. In this paper, we propose a non-autoregressive decoding based model with a coarse-to-fine captioning procedure to alleviate these defects. In implementations, we employ a bi-directional self-attention based network as our language model for achieving inference speedup, based on which we decompose the captioning procedure into two stages, where the model has different focuses. Specifically, given that visual words determine the semantic correctness of captions, we design a mechanism of generating visual words to not only promote the training of scene-related words but also capture relevant details from videos to construct a coarse-grained sentence "template". Thereafter, we devise dedicated decoding algorithms that fill in the "template" with suitable words and modify inappropriate phrasing via iterative refinement to obtain a fine-grained description. Extensive experiments on two mainstream video captioning benchmarks, i.e., MSVD and MSR-VTT, demonstrate that our approach achieves state-of-the-art performance, generates diverse descriptions, and obtains high inference efficiency. Our code is available at https://github.com/yangbang18/Non-Autoregressive-Video-Captioning.Comment: 9 pages, 6 figures, to be published in AAAI2021. Our code is available at https://github.com/yangbang18/Non-Autoregressive-Video-Captionin

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications