7 research outputs found
Multimodal Grounding for Language Processing
This survey discusses how recent developments in multimodal processing
facilitate conceptual grounding of language. We categorize the information flow
in multimodal processing with respect to cognitive models of human information
processing and analyze different methods for combining multimodal
representations. Based on this methodological inventory, we discuss the benefit
of multimodal grounding for a variety of language processing tasks and the
challenges that arise. We particularly focus on multimodal grounding of verbs
which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference
of Computational Linguistics. Please refer to this version for citations:
https://www.aclweb.org/anthology/papers/C/C18/C18-1197
Effect of dilution in asymmetric recurrent neural networks
We study with numerical simulation the possible limit behaviors of
synchronous discrete-time deterministic recurrent neural networks composed of N
binary neurons as a function of a network's level of dilution and asymmetry.
The network dilution measures the fraction of neuron couples that are
connected, and the network asymmetry measures to what extent the underlying
connectivity matrix is asymmetric. For each given neural network, we study the
dynamical evolution of all the different initial conditions, thus
characterizing the full dynamical landscape without imposing any learning rule.
Because of the deterministic dynamics, each trajectory converges to an
attractor, that can be either a fixed point or a limit cycle. These attractors
form the set of all the possible limit behaviors of the neural network. For
each network, we then determine the convergence times, the limit cycles'
length, the number of attractors, and the sizes of the attractors' basin. We
show that there are two network structures that maximize the number of possible
limit behaviors. The first optimal network structure is fully-connected and
symmetric. On the contrary, the second optimal network structure is highly
sparse and asymmetric. The latter optimal is similar to what observed in
different biological neuronal circuits. These observations lead us to
hypothesize that independently from any given learning model, an efficient and
effective biologic network that stores a number of limit behaviors close to its
maximum capacity tends to develop a connectivity structure similar to one of
the optimal networks we found.Comment: 31 pages, 5 figure
Biological constraints on neural network models of cognitive function
Neural network models are potential tools for improving our understanding of complex brain functions. To address this goal, these models need to be neurobiologically realistic. However, although neural networks have advanced dramatically in recent years and even achieve human-like performance on complex perceptual and cognitive tasks, their similarity to aspects of brain anatomy and physiology is imperfect. Here, we discuss different types of neural models, including localist, auto-associative and hetero-associative, deep and whole-brain networks, and identify aspects under which their biological plausibility can be improved. These aspects range from the choice of model neurons and of mechanisms of synaptic plasticity and learning, to implementation of inhibition and control, along with neuroanatomical properties including area structure and local and long-range connectivity. We highlight recent advances in developing biologically grounded cognitive theories and in mechanistically explaining, based on these brain-constrained neural models, hitherto unaddressed issues regarding the nature, localization and ontogenetic and phylogenetic development of higher brain functions. In closing, we point to possible future clinical applications of brain-constrained modelling
Interactive natural language acquisition in a multi-modal recurrent neural architecture
For the complex human brain that enables us to communicate in natural language, we gathered good understandings of principles underlying language acquisition and processing, knowledge about sociocultural conditions, and insights into activity patterns in the brain. However, we were not yet able to understand the behavioural and mechanistic characteristics for natural language and how mechanisms in the brain allow to acquire and process language. In bridging the insights from behavioural psychology and neuroscience, the goal of this paper is to contribute a computational understanding of appropriate characteristics that favour language acquisition. Accordingly, we provide concepts and refinements in cognitive modelling regarding principles and mechanisms in the brain and propose a neurocognitively plausible model for embodied language acquisition from real-world interaction of a humanoid robot with its environment. In particular, the architecture consists of a continuous time recurrent neural network, where parts have different leakage characteristics and thus operate on multiple timescales for every modality and the association of the higher level nodes of all modalities into cell assemblies. The model is capable of learning language production grounded in both, temporal dynamic somatosensation and vision, and features hierarchical concept abstraction, concept decomposition, multi-modal integration, and self-organisation of latent representations
Deep Vision in Optical Imagery: From Perception to Reasoning
Deep learning has achieved extraordinary success in a wide range of tasks in computer vision field over the past years. Remote sensing data present different properties as compared to natural images/videos, due to their unique imaging technique, shooting angle, etc. For instance, hyperspectral images usually have hundreds of spectral bands, offering additional information, and the size of objects (e.g., vehicles) in remote sensing
images is quite limited, which brings challenges for detection or segmentation tasks.
This thesis focuses on two kinds of remote sensing data, namely hyper/multi-spectral and high-resolution images, and explores several methods to try to find answers to the following questions:
- In comparison with natural images or videos in computer vision, the unique asset of hyper/multi-spectral data is their rich spectral information. But what this “additional” information brings for learning a network? And how do we take full advantage of these spectral bands?
- Remote sensing images at high resolution have pretty different characteristics, bringing challenges for several tasks, for example, small object segmentation. Can we devise tailored networks for such tasks?
- Deep networks have produced stunning results in a variety of perception tasks, e.g., image classification, object detection, and semantic segmentation. While the capacity to reason about relations over space is vital for intelligent species. Can a network/module with the capacity of reasoning benefit to parsing remote sensing data?
To this end, a couple of networks are devised to figure out what a network learns from hyperspectral images and how to efficiently use spectral bands. In addition, a multi-task learning network is investigated for the instance segmentation of vehicles from aerial images and videos. Finally, relational reasoning modules are designed to improve semantic segmentation of aerial images