2,272 research outputs found
Learning semantic sentence representations from visually grounded language without lexical knowledge
Current approaches to learning semantic representations of sentences often
use prior word-level knowledge. The current study aims to leverage visual
information in order to capture sentence level semantics without the need for
word embeddings. We use a multimodal sentence encoder trained on a corpus of
images with matching text captions to produce visually grounded sentence
embeddings. Deep Neural Networks are trained to map the two modalities to a
common embedding space such that for an image the corresponding caption can be
retrieved and vice versa. We show that our model achieves results comparable to
the current state-of-the-art on two popular image-caption retrieval benchmark
data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the
resulting sentence embeddings using the data from the Semantic Textual
Similarity benchmark task and show that the multimodal embeddings correlate
well with human semantic similarity judgements. The system achieves
state-of-the-art results on several of these benchmarks, which shows that a
system trained solely on multimodal data, without assuming any word
representations, is able to capture sentence level semantics. Importantly, this
result shows that we do not need prior knowledge of lexical level semantics in
order to model sentence level semantics. These findings demonstrate the
importance of visual information in semantics
Shared mechanisms support controlled retrieval from semantic and episodic memory: Evidence from semantic aphasia
Semantic cognition is supported by at least two interactive components: semantic representations and control mechanisms that shape retrieval to suit the circumstances. Semantic and episodic memory draw on largely distinguishable stores, yet it is unclear whether controlled retrieval from these representational systems is supported by shared mechanisms. Patients with semantic aphasia (SA) show heteromodal semantic control deficits following stroke to left inferior frontal gyrus (LIFG), an area implicated in semantic processing plus the control of memory and language. However, episodic memory has not been examined in these patients and although the role of LIFG in semantics is well-established, neuroimaging cannot ascertain whether this area is directly implicated in episodic control or if its activation reflects semantic processing elicited by the stimuli. Neuropsychology can address this question, revealing whether this area is necessary for both domains. We found that: (i) SA patients showed difficulty discarding dominant yet irrelevant semantic links during semantic and episodic decisions. Similarly, recently encoded events promoted interference during retrieval from both domains. (ii) Deficits were multimodal (i.e. equivalent using words and pictures) in both domains and, in the episodic domain, memory was compromised even when semantic processing required by the stimuli was minimal. (iii) In both domains, deficits were ameliorated when cues reduced the need to internally constrain retrieval. These cues could involve semantic information, self-reference or spatial location, representations all thought to be unaffected by IFG lesions. (iv) Training focussed on promoting flexible retrieval of conceptual knowledge showed generalization to untrained semantic and episodic tasks in some individuals; in others repetition of specific associations gave rise to inflexible retrieval and overgeneralization of trained associations during episodic tasks. Although the neuroanatomical specificity of neuropsychology is limited, this thesis provides evidence that shared mechanisms support the controlled retrieval of episodic and semantic memory
Detection-by-Localization: Maintenance-Free Change Object Detector
Recent researches demonstrate that self-localization performance is a very
useful measure of likelihood-of-change (LoC) for change detection. In this
paper, this "detection-by-localization" scheme is studied in a novel
generalized task of object-level change detection. In our framework, a given
query image is segmented into object-level subimages (termed "scene parts"),
which are then converted to subimage-level pixel-wise LoC maps via the
detection-by-localization scheme. Our approach models a self-localization
system as a ranking function, outputting a ranked list of reference images,
without requiring relevance score. Thanks to this new setting, we can
generalize our approach to a broad class of self-localization systems. Our
ranking based self-localization model allows to fuse self-localization results
from different modalities via an unsupervised rank fusion derived from a field
of multi-modal information retrieval (MMR).Comment: 7 pages, 3 figures, Technical repor
Interventional programmes to improve cognition during healthy and pathological ageing: Cortical modulations and evidence for brain plasticity
Available online 06 March 2018A growing body of evidence suggests that healthy elderly individuals and patients with Alzheimer’s disease retain an important potential for neuroplasticity. This review summarizes studies investigating the modulation of neural activity and structural brain integrity in response to interventions involving cognitive training, physical exercise and non-invasive brain stimulation in healthy elderly and cognitively impaired subjects (including patients with mild cognitive impairment (MCI) and Alzheimer’s disease). Moreover, given the clinical relevance of neuroplasticity, we discuss how evidence for neuroplasticity can be inferred from the functional and structural brain changes observed after implementing these interventions. We emphasize that multimodal programmes, which combine several types of interventions, improve cognitive function to a greater extent than programmes that use a single interventional approach. We suggest specific methods for weighting the relative importance of cognitive training, physical exercise and non-invasive brain stimulation according to the functional and structural state of the brain of the targeted subject to maximize the cognitive improvements induced by multimodal programmes.This study was funded by the European Commission Marie-Skłodowska Curie Actions, Individual Fellowships; 655423-NIBSAD, Italian Ministry of HealthGR-2011-02349998, and Galician government (Postdoctoral Grants Plan I2C 2011-2015)
Language learning using speech to image retrieval
Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition
- …