Improving Picture Captioning Using A Multi-Task Learning Method

KUMAR, ROSHAN; BHUVAN RAG, VODNALA; ANUSHA, VINNAKOTA; INDRA SENA REDDY, BORAVELLI; MEDIDA, Dr. JAYAPAL

research article

oai:ojs.ijitr.com:article/2822

Improving Picture Captioning Using A Multi-Task Learning Method

Authors: ROSHAN KUMAR
VODNALA BHUVAN RAG
VINNAKOTA ANUSHA
BORAVELLI INDRA SENA REDDY
Dr. JAYAPAL MEDIDA
Publication date: 5 April 2024
Publisher: International Journal of Innovative Technology and Research

Abstract

We present MLAIC, a multi-task learning approach to image captioning, motivated by the idea that individuals are naturally gifted in more than one area. The three main parts of MLAIC are as follows: (1) an image classification model that learns to use a convolutional neural network (CNN) to encode images with a lot of category awareness; (2) an image syntax generation model that learns to use a long short-term memory (LSTM) decoder to encode images with better syntax awareness; and (3) an image captioning model that uses its CNN encoder for object classification and its LSTM decoder for syntax generation. The extra information on syntax and object classification is very useful for the picture captioning model. Our model outperforms other formidable rivals, according to experimental findings on the MS-COCO dataset

Similar works

Full text

Open in the Core reader

Download PDF

International Journal of Innovative Technology and Research (IJITR)

oai:ojs.ijitr.com:article/2822

Last time updated on 11/09/2024

This paper was published in International Journal of Innovative Technology and Research (IJITR).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.