Improving Picture Captioning Using A Multi-Task Learning Method

Abstract

We present MLAIC, a multi-task learning approach to image captioning, motivated by the idea that individuals are naturally gifted in more than one area. The three main parts of MLAIC are as follows: (1) an image classification model that learns to use a convolutional neural network (CNN) to encode images with a lot of category awareness; (2) an image syntax generation model that learns to use a long short-term memory (LSTM) decoder to encode images with better syntax awareness; and (3) an image captioning model that uses its CNN encoder for object classification and its LSTM decoder for syntax generation. The extra information on syntax and object classification is very useful for the picture captioning model. Our model outperforms other formidable rivals, according to experimental findings on the MS-COCO dataset

Similar works

Full text

thumbnail-image

International Journal of Innovative Technology and Research (IJITR)

redirect
Last time updated on 11/09/2024

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.