Learning to reason over visual objects

Cohen, Jonathan D.; Mondal, Shanka Subhra; Webb, Taylor

Learning to reason over visual objects

Authors: Jonathan D. Cohen
Shanka Subhra Mondal
Taylor Webb
Publication date: 26 October 2023
Publisher

Abstract

A core component of human intelligence is the ability to identify abstract patterns inherent in complex, high-dimensional perceptual data, as exemplified by visual reasoning tasks such as Raven's Progressive Matrices (RPM). Motivated by the goal of designing AI systems with this capacity, recent work has focused on evaluating whether neural networks can learn to solve RPM-like problems. Previous work has generally found that strong performance on these problems requires the incorporation of inductive biases that are specific to the RPM problem format, raising the question of whether such models might be more broadly useful. Here, we investigated the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning. We found that a simple model, consisting only of an object-centric encoder and a transformer reasoning module, achieved state-of-the-art results on both of two challenging RPM-like benchmarks (PGM and I-RAVEN), as well as a novel benchmark with greater visual complexity (CLEVR-Matrices). These results suggest that an inductive bias for object-centric processing may be a key component of abstract visual reasoning, obviating the need for problem-specific inductive biases.Comment: ICLR 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2303.02260

Last time updated on 22/03/2023