Linguistic Variation and Anomalies in Comparisons of Human and Machine-Generated Image Captions

Dai, Minyue; Grandic, Sandra; Macbeth, Jamie C.

Linguistic Variation and Anomalies in Comparisons of Human and Machine-Generated Image Captions

Authors: Minyue Dai
Sandra Grandic
Jamie C. Macbeth
Publication date: 1 January 2019
Publisher: Smith ScholarWorks

Abstract

Describing the content of a visual image is a fundamental ability of human vision and language systems. Over the past several years, researchers have published on major improvements on image captioning, largely due to the development of deep learning systems trained on large data sets of images and human-written captions. However, these systems have major limitations, and their development has been narrowly focused on improving scores on relatively simple “bag-of-words” metrics. Very little work has examined the overall complex patterns of the language produced by image-captioning systems and how it compares to captions written by humans. In this paper, we closely examine patterns in machine-generated captions and characterize how conventional metrics are inconsistent at penalizing them for nonhuman-like erroneous output. We also hypothesize that the complexity of a visual scene should be reflected in the linguistic variety of the captions and, in testing this hypothesis, we find that human-generated captions have a dramatically greater degree of lexical, syntactic, and semantic variation. These results have important implications for the design of performance metrics, gauging what deep learning captioning systems really understand in images, and the importance of the task of image captioning for cognitive systems researc

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Smith College: Smith ScholarWorks

oai:scholarworks.smith.edu:csc...

Last time updated on 12/10/2021