1 research outputs found

    Multiā€task learning for captioning images with novel words

    No full text
    Recent captioning models are limited in their ability to describe concepts unseen in paired imageā€“sentence pairs. This study presents a framework of multiā€task learning for describing novel words not present in existing imageā€captioning datasets. The authorsā€™ framework takes advantage of external sourcesā€labelled images from image classification datasets, and semantic knowledge extracted from the annotated text. They propose minimising a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings. When in the inference step they change the BeamSearch step by considering both the caption model and language model enabling the model to generalise novel words outside of imageā€captioning datasets. They demonstrate that in the framework by adding an annotated text data which can help the image captioning model to describe images with the right corresponding novel words. Extensive experiments are conducted on both AI Challenger and Microsoft coco (MSCOCO) image captioning datasets of two different languages, demonstrating the ability of their framework to describe novel words such as scenes and objects
    corecore