Seeing the Meaning: Vision Meets Semanticsin Solving Pictorial Analogy Problems

Abstract

We report a first effort to model the solution of meaningful four-termvisual analogies, by combining a machine-vision model (ResNet50-A) that can classify pixel-level images into object categories, with acognitive model (BART) that takes semantic representations of wordsas input and identifies semantic relations instantiated by a word pair.Each model achieves above-chance performance in selecting the bestanalogical option from a set of four. However, combining the visualand the semantic models increases analogical performance above thelevel achieved by either model alone. The contribution of vision toreasoning thus may extend beyond simply generating verbalrepresentations from images. These findings provide a proof ofconcept that a comprehensive model can solve semantically-richanalogies from pixel-level inputs

    Similar works