66 research outputs found
Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping
Deep Neural Networks (DNNs) that achieve human-level performance in general
tasks like object segmentation typically require supervised labels. In
contrast, humans are able to perform these tasks effortlessly without
supervision. To accomplish this, the human visual system makes use of
perceptual grouping. Understanding how perceptual grouping arises in an
unsupervised manner is critical for improving both models of the visual system,
and computer vision models. In this work, we propose a counterintuitive
approach to unsupervised perceptual grouping and segmentation: that they arise
because of neural noise, rather than in spite of it. We (1) mathematically
demonstrate that under realistic assumptions, neural noise can be used to
separate objects from each other, and (2) show that adding noise in a DNN
enables the network to segment images even though it was never trained on any
segmentation labels. Interestingly, we find that (3) segmenting objects using
noise results in segmentation performance that aligns with the perceptual
grouping phenomena observed in humans. We introduce the Good Gestalt (GG)
datasets -- six datasets designed to specifically test perceptual grouping, and
show that our DNN models reproduce many important phenomena in human
perception, such as illusory contours, closure, continuity, proximity, and
occlusion. Finally, we (4) demonstrate the ecological plausibility of the
method by analyzing the sensitivity of the DNN to different magnitudes of
noise. We find that some model variants consistently succeed with remarkably
low levels of neural noise (), and surprisingly, that segmenting
this way requires as few as a handful of samples. Together, our results suggest
a novel unsupervised segmentation method requiring few assumptions, a new
explanation for the formation of perceptual grouping, and a potential benefit
of neural noise in the visual system
AI2D-RST : A multimodal corpus of 1000 primary school science diagrams
This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowdsourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.Peer reviewe
- …