12 research outputs found
Approaching human 3D shape perception with neurally mappable models
Humans effortlessly infer the 3D shape of objects. What computations underlie
this ability? Although various computational models have been proposed, none of
them capture the human ability to match object shape across viewpoints. Here,
we ask whether and how this gap might be closed. We begin with a relatively
novel class of computational models, 3D neural fields, which encapsulate the
basic principles of classic analysis-by-synthesis in a deep neural network
(DNN). First, we find that a 3D Light Field Network (3D-LFN) supports 3D
matching judgments well aligned to humans for within-category comparisons,
adversarially-defined comparisons that accentuate the 3D failure cases of
standard DNN models, and adversarially-defined comparisons for algorithmically
generated shapes with no category structure. We then investigate the source of
the 3D-LFN's ability to achieve human-aligned performance through a series of
computational experiments. Exposure to multiple viewpoints of objects during
training and a multi-view learning objective are the primary factors behind
model-human alignment; even conventional DNN architectures come much closer to
human behavior when trained with multi-view objectives. Finally, we find that
while the models trained with multi-view learning objectives are able to
partially generalize to new object categories, they fall short of human
alignment. This work provides a foundation for understanding human shape
inferences within neurally mappable computational architectures and highlights
important questions for future work
Unsupervised Segmentation in Real-World Images via Spelke Object Inference
Self-supervised, category-agnostic segmentation of real-world images is a
challenging open problem in computer vision. Here, we show how to learn static
grouping priors from motion self-supervision by building on the cognitive
science concept of a Spelke Object: a set of physical stuff that moves
together. We introduce the Excitatory-Inhibitory Segment Extraction Network
(EISEN), which learns to extract pairwise affinity graphs for static scenes
from motion-based training signals. EISEN then produces segments from
affinities using a novel graph propagation and competition network. During
training, objects that undergo correlated motion (such as robot arms and the
objects they move) are decoupled by a bootstrapping process: EISEN explains
away the motion of objects it has already learned to segment. We show that
EISEN achieves a substantial improvement in the state of the art for
self-supervised image segmentation on challenging synthetic and real-world
robotics datasets.Comment: 25 pages, 10 figure
Recommended from our members
Language as a bootstrap for compositional visual reasoning
People think and learn abstractly and compositionally. These two key properties of human cognition are shared with natural language: we use a finite, composable vocabulary of nameable concepts to generate and understand a combinatorially large space of new sentence. In this paper, we present a domain of compositional reasoning tasks and an artificial language learning paradigm designed to probe the role language plays in bootstrapping learning. We discuss results from a language-guided program learning model suggesting that language can play an important role in bootstrapping learning by providing an important signal for search on individual problems, and a cue towards named, reusable abstractions across the domain as a whole. We evaluate adults on the same domain, comparing learning performance between those tasked with jointly learning language and solving reasoning tasks, and those who only approach the domain as a collection of inductive reasoning problems. We find that adults provided with abstract language prompts are better equipped to generalize and compose concepts learned across a domain than adults solving the same problems using reasoning alone
Recommended from our members
Explaining the Gestalt principle of common fate as amortized inference
Humans perceive the world through a rich, object-centric lens. We are able to infer 3D geometry and features of objects from sparse and noisy data. Gestalt rules describe how perceptual stimuli tend to be grouped based on properties like proximity, closure, and continuity. However, it remains an open question how these mechanisms are implemented algorithmically in the brain, and how (or why) they functionally support 3D object perception. Here, we describe a computational model which accounts for the Gestalt principle of Common Fate - grouping stimuli by shared motion statistics. We argue that this mechanism can be explained as bottom-up neural amortized inference in a top-down generative model for object-based scenes. Our generative model places a low-dimensional prior on the motion and shape of objects, while our inference network learns to group feature clusters using inverse renderings of noisily textured objects moving through time, effectively enabling 3D shape perception
Recommended from our members
Benchmarking mid-level vision with texture-defined 3D objects
We introduce a new benchmark dataset based on classic methods of studying 3D shape perception from texture and motion, inspired by earlier work on Gestalt principles of perceptual organization and the ecological (Gibsonian) approach to perception of structure in moving displays. The dataset consists of parametric 3D shapes (superquadrics) with procedurally generated textures rotating and translating against a similarly textured backdrop. We expect these stimuli to be challenging for current computer vision models, as they depart from the statistics of real-world or realistically rendered stimuli. We test a variety of models’ ability to segment textured stimuli across three training conditions: pre-trained on naturalistic stimuli, pre-trained+fine-tuned on textured stimuli, trained on textured stimuli. While no models generalize to segment textured stimuli without fine-tuning, performance improves with fine-tuning and training on textured stimuli. We will discuss how this benchmark can guide models of scene perception towards more human-like robustness and generality
Recommended from our members
Advancing Cognitive Science and AI with Cognitive-AI Benchmarking
What are the current limits of AI models in explaining human cognition and behavior? How might approaches from the cognitive sciences drive the development of more robust and reliable AI systems? The goal of this workshop is bring together researchers across cognitive science and artificial intelligence (AI) to engage with these questions and identify opportunities to work together to advance progress in both fields. In particular, we propose Cognitive-AI Benchmarking as a particularly promising strategy --- that is, the community-coordinated establishment of common benchmarks, tools, and best practices for model-human comparisons across diverse and ecologically relevant domains and tasks. We will host a combination of talks, panel discussion, and breakout activities to: highlight past successes in Cognitive-AI Benchmarking and limitations of current approaches, share tools and best practices, and outline future challenges and goals for the field
Recommended from our members
Identifying concept libraries from language about object structure
Our understanding of the visual world goes beyond naming objects, encompassing our ability to parse objects into meaningful parts, attributes, and relations. In this work, we leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use and the principles leading these parts to be favored over others.We formalize our problem as search over a space of program libraries that contain different part concepts, using tools from machine translation to evaluate how well programs expressed in each library align to human language. By combining naturalistic language at scale with structured program representations, we discover a fundamental information-theoretic tradeoff governing the part concepts people name: people favor a lexicon that allows concise descriptions of each object, while also minimizing the size of the lexicon itself
Identifying concept libraries from language about object structure
Our understanding of the visual world goes beyond naming objects,
encompassing our ability to parse objects into meaningful parts, attributes,
and relations. In this work, we leverage natural language descriptions for a
diverse set of 2K procedurally generated objects to identify the parts people
use and the principles leading these parts to be favored over others. We
formalize our problem as search over a space of program libraries that contain
different part concepts, using tools from machine translation to evaluate how
well programs expressed in each library align to human language. By combining
naturalistic language at scale with structured program representations, we
discover a fundamental information-theoretic tradeoff governing the part
concepts people name: people favor a lexicon that allows concise descriptions
of each object, while also minimizing the size of the lexicon itself.Comment: Appears in the conference proceedings of CogSci 202
Mechanism of short-term ERK activation by electromagnetic fields at mobile phone frequencies
The exposure to non-thermal microwave electromagnetic fields generated by mobile phones affects the expression of many proteins. This effect on transcription and protein stability can be mediated by the MAPK (mitogen-activated protein kinase) cascades, which serve as central signalling pathways and govern essentially all stimulated cellular processes. Indeed, long-term exposure of cells to mobile phone irradiation results in the activation of p38 as well as the ERK (extracellular-signal-regulated kinase) MAPKs. In the present study, we have studied the immediate effect of irradiation on the MAPK cascades, and found that ERKs, but not stress-related MAPKs, are rapidly activated in response to various frequencies and intensities. Using signalling inhibitors, we delineated the mechanism that is involved in this activation. We found that the first step is mediated in the plasma membrane by NADH oxidase, which rapidly generates ROS (reactive oxygen species). These ROS then directly stimulate MMPs (matrix metalloproteinases) and allow them to cleave and release Hb-EGF [heparin-binding EGF (epidermal growth factor)]. This secreted factor activates the EGF receptor, which in turn further activates the ERK cascade. Thus this study demonstrates for the first time a detailed molecular mechanism by which electromagnetic irradiation from mobile phones induces the activation of the ERK cascade and thereby induces transcription and other cellular processes