49 research outputs found

    Faces as a "Model Category" for Visual Object Recognition

    Get PDF
    Visual recognition is an important ability that is central to many everyday tasks such as reading, navigation and social interaction, and is therefore actively studied in neuroscience, cognitive psychology and artificial intelligence. There exist thousands of object categories, all of which pose similar challenges to biological and artificial visual systems: accurate recognition under varying location, scale, view angle, illumination and clutter. In many areas of science, important discoveries have been made using "model organisms" such as fruit flies, mice and macaques. For the thousands of object categories, the important and well-studied category of faces could potentially serve as a "model category" upon which efforts are focused, and from which fundamental insights are drawn. However, it has been hotly debated whether faces are processed by the brain in a manner fundamentally different from other categories. Here we show that "neural tuning size" -- a single parameter in a computational model of object processing -- is able to account for important face-specific phenomena. Thus, surprisingly, "face-like" processing is explainable by physiological mechanisms that differ only quantitatively from "object-like" processing. Our computational proof-of-principle provides specific neural tuning properties that correspond to the so-far qualitative and controversial notion of "holistic" face processing. Overall, faces may be a viable model category. Since faces are highly amenable to complementary experimental techniques like functional MRI, electrophysiology, electroencephalography and transcranial magnetic stimulation, this further raises the odds that the algorithms and neural circuits underlying visual recognition may first be solved for faces. With faces serving as a model category, the great scientific challenge of understanding and reverse-engineering general visual recognition can be greatly accelerated

    Neural tuning size is a key factor underlying holistic face processing

    Get PDF
    Faces are a class of visual stimuli with unique significance, for a variety of reasons. They are ubiquitous throughout the course of a person's life, and face recognition is crucial for daily social interaction. Faces are also unlike any other stimulus class in terms of certain physical stimulus characteristics. Furthermore, faces have been empirically found to elicit certain characteristic behavioral phenomena, which are widely held to be evidence of "holistic" processing of faces. However, little is known about the neural mechanisms underlying such holistic face processing. In other words, for the processing of faces by the primate visual system, the input and output characteristics are relatively well known, but the internal neural computations are not. The main aim of this work is to further the fundamental understanding of what causes the visual processing of faces to be different from that of objects. In this computational modeling work, we show that a single factor - "neural tuning size" - is able to account for three key phenomena that are characteristic of face processing, namely the Composite Face Effect (CFE), Face Inversion Effect (FIE) and Whole-Part Effect (WPE). Our computational proof-of-principle provides specific neural tuning properties that correspond to the poorly-understood notion of holistic face processing, and connects these neural properties to psychophysical behavior. Overall, our work provides a unified and parsimonious theoretical account for the disparate empirical data on face-specific processing, deepening the fundamental understanding of face processing

    Neural Tuning Size in a Model of Primate Visual Processing Accounts for Three Key Markers of Holistic Face Processing

    Get PDF
    Faces are an important and unique class of visual stimuli, and have been of interest to neuroscientists for many years. Faces are known to elicit certain characteristic behavioral markers, collectively labeled “holistic processing”, while non-face objects are not processed holistically. However, little is known about the underlying neural mechanisms. The main aim of this computational simulation work is to investigate the neural mechanisms that make face processing holistic. Using a model of primate visual processing, we show that a single key factor, “neural tuning size”, is able to account for three important markers of holistic face processing: the Composite Face Effect (CFE), Face Inversion Effect (FIE) and Whole-Part Effect (WPE). Our proof-of-principle specifies the precise neurophysiological property that corresponds to the poorly-understood notion of holism, and shows that this one neural property controls three classic behavioral markers of holism. Our work is consistent with neurophysiological evidence, and makes further testable predictions. Overall, we provide a parsimonious account of holistic face processing, connecting computation, behavior and neurophysiology.National Science Foundation (U.S.) (STC award CCF-1231216

    Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

    Full text link
    Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource efficiency. Since AI researchers benchmark a significant proportion of performance standards through human intelligence, cognitive sciences-inspired AI is a promising domain of research. Studying cognitive science can provide a fresh perspective to building fundamental blocks in AI research, which can lead to improved performance and efficiency. In this review paper, we focus on the cognitive functions of perception, which is the process of taking signals from one's surroundings as input, and processing them to understand the environment. Particularly, we study and compare its various processes through the lens of both cognitive sciences and AI. Through this study, we review all current major theories from various sub-disciplines of cognitive science (specifically neuroscience, psychology and linguistics), and draw parallels with theories and techniques from current practices in AI. We, hence, present a detailed collection of methods in AI for researchers to build AI systems inspired by cognitive science. Further, through the process of reviewing the state of cognitive-inspired AI, we point out many gaps in the current state of AI (with respect to the performance of the human brain), and hence present potential directions for researchers to develop better perception systems in AI.Comment: Summary: a detailed review of the current state of perception models through the lens of cognitive A

    STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning

    Full text link
    Understanding relations between objects is crucial for understanding the semantics of a visual scene. It is also an essential step in order to bridge visual and language models. However, current state-of-the-art computer vision models still lack the ability to perform spatial reasoning well. Existing datasets mostly cover a relatively small number of spatial relations, all of which are static relations that do not intrinsically involve motion. In this paper, we propose the Spatial and Temporal Understanding of Prepositions Dataset (STUPD) -- a large-scale video dataset for understanding static and dynamic spatial relationships derived from prepositions of the English language. The dataset contains 150K visual depictions (videos and images), consisting of 30 distinct spatial prepositional senses, in the form of object interaction simulations generated synthetically using Unity3D. In addition to spatial relations, we also propose 50K visual depictions across 10 temporal relations, consisting of videos depicting event/time-point interactions. To our knowledge, no dataset exists that represents temporal relations through visual settings. In this dataset, we also provide 3D information about object interactions such as frame-wise coordinates, and descriptions of the objects used. The goal of this synthetic dataset is to help models perform better in visual relationship detection in real-world settings. We demonstrate an increase in the performance of various models over 2 real-world datasets (ImageNet-VidVRD and Spatial Senses) when pretrained on the STUPD dataset, in comparison to other pretraining datasets.Comment: Submitted to Neurips Dataset track. 24 pages including citations and appendi

    Abductive Action Inference

    Full text link
    Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this work, we propose a new task called abductive action inference, in which given a situation, the model answers the question `what actions were executed by the human in order to arrive in the current state?'. Given a state, we investigate three abductive inference problems: action set prediction, action sequence prediction, and abductive action verification. We benchmark several SOTA models such as Transformers, Graph neural networks, CLIP, BLIP, end-to-end trained Slow-Fast, and Resnet50-3D models. Our newly proposed object-relational BiGED model outperforms all other methods on this challenging task on the Action Genome dataset. Codes will be made available.Comment: 16 pages, 9 figure

    Towards a unified account of face (and maybe object) processing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 191-197).Faces are an important class of visual stimuli, and are thought to be processed differently from objects by the human visual system. Going beyond the false dichotomy of same versus different processing, it is more important to understand how exactly faces are processed similarly or differently from objects. However, even by itself, face processing is poorly understood. Various aspects of face processing, such as holistic, configural, and face-space processing, are investigated in relative isolation, and the relationships between these are unclear. Furthermore, face processing is characteristically affected by various stimulus transformations such as inversion, contrast reversal and spatial frequency filtering, but how or why is unclear. Most importantly, we do not understand even the basic mechanisms of face processing. We hypothesize that what makes face processing distinctive is the existence of large, coarse face templates. We test our hypothesis by modifying an existing model of object processing to utilize such templates, and find that our model can account for many face-related phenomena. Using small, fine face templates as a control, we find that our model displays object-like processing characteristics instead. Overall, we believe that we may have made the first steps towards achieving a unified account of face processing. In addition, results from our control suggest that face and object processing share fundamental computational mechanisms. Coupled with recent advances in brain recording techniques, our results mean that face recognition could form the "tip of the spear" for attacking and solving the problem of visual recognition.by Cheston Y.-C. Tan.Ph.D
    corecore