11,786 research outputs found
A half century of progress towards a unified neural theory of mind and brain with applications to autonomous adaptive agents and mental disorders
Invited article for the book
Artificial Intelligence in the Age of
Neural Networks and Brain Computing
R. Kozma, C. Alippi, Y. Choe, and F. C. Morabito, Eds.
Cambridge, MA: Academic PressThis article surveys some of the main design principles, mechanisms, circuits, and architectures that have been discovered during a half century of systematic research aimed at developing a unified theory that links mind and brain, and shows how psychological functions arise as emergent properties of brain mechanisms. The article describes a theoretical method that has enabled such a theory to be developed in stages by carrying out a kind of conceptual evolution. It also describes revolutionary computational paradigms like Complementary Computing and Laminar Computing that constrain the kind of unified theory that can describe the autonomous adaptive intelligence that emerges from advanced brains. Adaptive Resonance Theory, or ART, is one of the core models that has been discovered in this way. ART proposes how advanced brains learn to attend, recognize, and predict objects and events in a changing world that is filled with unexpected events. ART is not, however, a “theory of everything” if only because, due to Complementary Computing, different matching and learning laws tend to support perception and cognition on the one hand, and spatial representation and action on the other. The article mentions why a theory of this kind may be useful in the design of autonomous adaptive agents in engineering and technology. It also notes how the theory has led to new mechanistic insights about mental disorders such as autism, medial temporal amnesia, Alzheimer’s disease, and schizophrenia, along with mechanistically informed proposals about how their symptoms may be ameliorated
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
We introduce a Multi-modal Neural Machine Translation model in which a
doubly-attentive decoder naturally incorporates spatial visual features
obtained using pre-trained convolutional neural networks, bridging the gap
between image description and translation. Our decoder learns to attend to
source-language words and parts of an image independently by means of two
separate attention mechanisms as it generates words in the target language. We
find that our model can efficiently exploit not just back-translated in-domain
multi-modal data but also large general-domain text-only MT corpora. We also
report state-of-the-art results on the Multi30k data set.Comment: 8 pages (11 including references), 2 figure
Recommended from our members
A perceptual comparison of empirical and predictive region-of-interest video
When viewing multimedia presentations, a user only
attends to a relatively small part of the video display at any one point in time. By shifting allocation of bandwidth from peripheral areas to those locations where a user’s gaze is more likely to rest, attentive displays can be produced. Attentive displays aim to reduce resource requirements while minimizing negative user perception—understood in this paper as not only a user’s ability to assimilate and understand information but also his/her subjective satisfaction with the video content. This paper introduces and discusses a perceptual comparison between two region-of-interest display (RoID) adaptation techniques. A RoID is an attentive display where bandwidth has been preallocated around measured or highly probable areas of user gaze. In this paper, video content was manipulated using two sources of data: empirical measured data (captured using eye-tracking technology) and predictive data (calculated from the physical characteristics of the video data). Results show that display adaptation causes significant variation in users’ understanding of specific multimedia content. Interestingly, RoID adaptation and the type of video being presented both affect user perception of video quality. Moreover, the use of frame rates less than 15 frames per second, for any video adaptation technique, caused a significant reduction in user perceived quality, suggesting that although users are aware of video quality reduction, it does impact level of information assimilation and understanding. Results also highlight that user level of enjoyment is significantly affected by the type of video yet is not as affected by the quality or type of video adaptation—an interesting implication in the field of entertainment
SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods
Different effects of adding white noise on cognitive performance of sub-, normal and super-attentive school children
Objectives: Noise often has detrimental effects on performance. However, because of the phenomenon of stochastic resonance (SR), auditory white noise (WN) can alter the "signal to noise'' ratio and improve performance. The Moderate Brain Arousal (MBA) model postulates different levels of internal "neural noise'' in individuals with different attentional capacities. This in turn determines the particular WN level most beneficial in each individual case-with one level of WN facilitating poor attenders but hindering super-attentive children. The objective of the present study is to find out if added WN affects cognitive performance differently in children that differ in attention ability.
Methods: Participants were teacher-rated super-(N = 25); normal-(N = 29) and sub-attentive (N = 36) children (aged 8 to 10 years). Two non-executive function (EF) tasks (a verbal episodic recall task and a delayed verbal recognition task) and two EF tasks (a visuo-spatial working memory test and a Go-NoGo task) were performed under three WN levels. The non-WN condition was only used to control for potential differences in background noise in the group testing situations.
Results: There were different effects of WN on performance in the three groups-adding moderate WN worsened the performance of super-attentive children for both task types and improved EF performance in sub-attentive children. The normal-attentive children's performance was unaffected by WN exposure. The shift from moderate to high levels of WN had little further effect on performance in any group.
Significance: The predicted differential effect of WN on performance was confirmed. However, the failure to find evidence for an inverted U function challenges current theories. Alternative explanations are discussed. We propose that WN therapy should be further investigated as a possible non-pharmacological treatment for inattention
Language-Based Image Editing with Recurrent Attentive Models
We investigate the problem of Language-Based Image Editing (LBIE). Given a
source image and a natural language description, we want to generate a target
image by editing the source image based on the description. We propose a
generic modeling framework for two sub-tasks of LBIE: language-based image
segmentation and image colorization. The framework uses recurrent attentive
models to fuse image and language features. Instead of using a fixed step size,
we introduce for each region of the image a termination gate to dynamically
determine after each inference step whether to continue extrapolating
additional information from the textual description. The effectiveness of the
framework is validated on three datasets. First, we introduce a synthetic
dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE
system. Second, we show that the framework leads to state-of-the-art
performance on image segmentation on the ReferIt dataset. Third, we present the
first language-based colorization result on the Oxford-102 Flowers dataset.Comment: Accepted to CVPR 2018 as a Spotligh
Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension
In this work, we introduce a novel algorithm for solving the textbook
question answering (TQA) task which describes more realistic QA problems
compared to other recent tasks. We mainly focus on two related issues with
analysis of the TQA dataset. First, solving the TQA problems requires to
comprehend multi-modal contexts in complicated input data. To tackle this issue
of extracting knowledge features from long text lessons and merging them with
visual features, we establish a context graph from texts and images, and
propose a new module f-GCN based on graph convolutional networks (GCN). Second,
scientific terms are not spread over the chapters and subjects are split in the
TQA dataset. To overcome this so called "out-of-domain" issue, before learning
QA problems, we introduce a novel self-supervised open-set learning process
without any annotations. The experimental results show that our model
significantly outperforms prior state-of-the-art methods. Moreover, ablation
studies validate that both methods of incorporating f-GCN for extracting
knowledge from multi-modal contexts and our newly proposed self-supervised
learning process are effective for TQA problems.Comment: ACL2019 Camera-read
- …