Search CORE

11,786 research outputs found

A half century of progress towards a unified neural theory of mind and brain with applications to autonomous adaptive agents and mental disorders

Author: Grossberg Stephen
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 08/03/2018
Field of study

Invited article for the book Artificial Intelligence in the Age of Neural Networks and Brain Computing R. Kozma, C. Alippi, Y. Choe, and F. C. Morabito, Eds. Cambridge, MA: Academic PressThis article surveys some of the main design principles, mechanisms, circuits, and architectures that have been discovered during a half century of systematic research aimed at developing a unified theory that links mind and brain, and shows how psychological functions arise as emergent properties of brain mechanisms. The article describes a theoretical method that has enabled such a theory to be developed in stages by carrying out a kind of conceptual evolution. It also describes revolutionary computational paradigms like Complementary Computing and Laminar Computing that constrain the kind of unified theory that can describe the autonomous adaptive intelligence that emerges from advanced brains. Adaptive Resonance Theory, or ART, is one of the core models that has been discovered in this way. ART proposes how advanced brains learn to attend, recognize, and predict objects and events in a changing world that is filled with unexpected events. ART is not, however, a “theory of everything” if only because, due to Complementary Computing, different matching and learning laws tend to support perception and cognition on the one hand, and spatial representation and action on the other. The article mentions why a theory of this kind may be useful in the design of autonomous adaptive agents in engineering and technology. It also notes how the theory has led to new mechanistic insights about mental disorders such as autism, medial temporal amnesia, Alzheimer’s disease, and schizophrenia, along with mechanistically informed proposals about how their symptoms may be ameliorated

Boston University Institutional Repository (OpenBU)

Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

Author: Calixto Iacer
Campbell Nick
Liu Qun
Publication venue
Publication date: 01/01/2017
Field of study

We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation. Our decoder learns to attend to source-language words and parts of an image independently by means of two separate attention mechanisms as it generates words in the target language. We find that our model can efficiently exploit not just back-translated in-domain multi-modal data but also large general-domain text-only MT corpora. We also report state-of-the-art results on the Multi30k data set.Comment: 8 pages (11 including references), 2 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

A perceptual comparison of empirical and predictive region-of-interest video

Author: Ghinea G
Gulliver SR
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

When viewing multimedia presentations, a user only attends to a relatively small part of the video display at any one point in time. By shifting allocation of bandwidth from peripheral areas to those locations where a user’s gaze is more likely to rest, attentive displays can be produced. Attentive displays aim to reduce resource requirements while minimizing negative user perception—understood in this paper as not only a user’s ability to assimilate and understand information but also his/her subjective satisfaction with the video content. This paper introduces and discusses a perceptual comparison between two region-of-interest display (RoID) adaptation techniques. A RoID is an attentive display where bandwidth has been preallocated around measured or highly probable areas of user gaze. In this paper, video content was manipulated using two sources of data: empirical measured data (captured using eye-tracking technology) and predictive data (calculated from the physical characteristics of the video data). Results show that display adaptation causes significant variation in users’ understanding of specific multimedia content. Interestingly, RoID adaptation and the type of video being presented both affect user perception of video quality. Moreover, the use of frame rates less than 15 frames per second, for any video adaptation technique, caused a significant reduction in user perceived quality, suggesting that although users are aware of video quality reduction, it does impact level of information assimilation and understanding. Results also highlight that user level of enjoyment is significantly affected by the type of video yet is not as affected by the quality or type of video adaptation—an interesting implication in the field of entertainment

Central Archive at the University of Reading

CiteSeerX

Crossref

Brunel University Research Archive

SMAN : Stacked Multi-Modal Attention Network for cross-modal image-text retrieval

Author: Han Jungong
Ji Zhong
Pang Yanwei
Wang Haoran
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2022
Field of study

This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods

Warwick Research Archives Portal Repository

Different effects of adding white noise on cognitive performance of sub-, normal and super-attentive school children

Author: Bamford S
Barke Edmund
Helps SK
Soderlund GBW
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Objectives: Noise often has detrimental effects on performance. However, because of the phenomenon of stochastic resonance (SR), auditory white noise (WN) can alter the "signal to noise'' ratio and improve performance. The Moderate Brain Arousal (MBA) model postulates different levels of internal "neural noise'' in individuals with different attentional capacities. This in turn determines the particular WN level most beneficial in each individual case-with one level of WN facilitating poor attenders but hindering super-attentive children. The objective of the present study is to find out if added WN affects cognitive performance differently in children that differ in attention ability. Methods: Participants were teacher-rated super-(N = 25); normal-(N = 29) and sub-attentive (N = 36) children (aged 8 to 10 years). Two non-executive function (EF) tasks (a verbal episodic recall task and a delayed verbal recognition task) and two EF tasks (a visuo-spatial working memory test and a Go-NoGo task) were performed under three WN levels. The non-WN condition was only used to control for potential differences in background noise in the group testing situations. Results: There were different effects of WN on performance in the three groups-adding moderate WN worsened the performance of super-attentive children for both task types and improved EF performance in sub-attentive children. The normal-attentive children's performance was unaffected by WN exposure. The shift from moderate to high levels of WN had little further effect on performance in any group. Significance: The predicted differential effect of WN on performance was confirmed. However, the failure to find evidence for an inverted U function challenges current theories. Alternative explanations are discussed. We propose that WN therapy should be further investigated as a possible non-pharmacological treatment for inattention

Ghent University Academic Bibliography

Language-Based Image Editing with Recurrent Attentive Models

Author: Chen Jianbo
Gao Jianfeng
Liu Jingjing
Liu Xiaodong
Shen Yelong
Publication venue
Publication date: 10/06/2018
Field of study

We investigate the problem of Language-Based Image Editing (LBIE). Given a source image and a natural language description, we want to generate a target image by editing the source image based on the description. We propose a generic modeling framework for two sub-tasks of LBIE: language-based image segmentation and image colorization. The framework uses recurrent attentive models to fuse image and language features. Instead of using a fixed step size, we introduce for each region of the image a termination gate to dynamically determine after each inference step whether to continue extrapolating additional information from the textual description. The effectiveness of the framework is validated on three datasets. First, we introduce a synthetic dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE system. Second, we show that the framework leads to state-of-the-art performance on image segmentation on the ReferIt dataset. Third, we present the first language-based colorization result on the Oxford-102 Flowers dataset.Comment: Accepted to CVPR 2018 as a Spotligh

arXiv.org e-Print Archive

Crossref

Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension

Author: Kim Daesik
Kim Seonhoon
Kwak Nojun
Publication venue
Publication date: 01/01/2019
Field of study

In this work, we introduce a novel algorithm for solving the textbook question answering (TQA) task which describes more realistic QA problems compared to other recent tasks. We mainly focus on two related issues with analysis of the TQA dataset. First, solving the TQA problems requires to comprehend multi-modal contexts in complicated input data. To tackle this issue of extracting knowledge features from long text lessons and merging them with visual features, we establish a context graph from texts and images, and propose a new module f-GCN based on graph convolutional networks (GCN). Second, scientific terms are not spread over the chapters and subjects are split in the TQA dataset. To overcome this so called "out-of-domain" issue, before learning QA problems, we introduce a novel self-supervised open-set learning process without any annotations. The experimental results show that our model significantly outperforms prior state-of-the-art methods. Moreover, ablation studies validate that both methods of incorporating f-GCN for extracting knowledge from multi-modal contexts and our newly proposed self-supervised learning process are effective for TQA problems.Comment: ACL2019 Camera-read

arXiv.org e-Print Archive

Crossref