International Journal of Innovative Technology and Research
Abstract
We present MLAIC, a multi-task learning approach to image captioning, motivated by the idea that individuals are naturally gifted in more than one area. The three main parts of MLAIC are as follows: (1) an image classification model that learns to use a convolutional neural network (CNN) to encode images with a lot of category awareness; (2) an image syntax generation model that learns to use a long short-term memory (LSTM) decoder to encode images with better syntax awareness; and (3) an image captioning model that uses its CNN encoder for object classification and its LSTM decoder for syntax generation. The extra information on syntax and object classification is very useful for the picture captioning model. Our model outperforms other formidable rivals, according to experimental findings on the MS-COCO dataset
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.