2,272 research outputs found

    Learning semantic sentence representations from visually grounded language without lexical knowledge

    Get PDF
    Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-the-art results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics

    Shared mechanisms support controlled retrieval from semantic and episodic memory: Evidence from semantic aphasia

    Get PDF
    Semantic cognition is supported by at least two interactive components: semantic representations and control mechanisms that shape retrieval to suit the circumstances. Semantic and episodic memory draw on largely distinguishable stores, yet it is unclear whether controlled retrieval from these representational systems is supported by shared mechanisms. Patients with semantic aphasia (SA) show heteromodal semantic control deficits following stroke to left inferior frontal gyrus (LIFG), an area implicated in semantic processing plus the control of memory and language. However, episodic memory has not been examined in these patients and although the role of LIFG in semantics is well-established, neuroimaging cannot ascertain whether this area is directly implicated in episodic control or if its activation reflects semantic processing elicited by the stimuli. Neuropsychology can address this question, revealing whether this area is necessary for both domains. We found that: (i) SA patients showed difficulty discarding dominant yet irrelevant semantic links during semantic and episodic decisions. Similarly, recently encoded events promoted interference during retrieval from both domains. (ii) Deficits were multimodal (i.e. equivalent using words and pictures) in both domains and, in the episodic domain, memory was compromised even when semantic processing required by the stimuli was minimal. (iii) In both domains, deficits were ameliorated when cues reduced the need to internally constrain retrieval. These cues could involve semantic information, self-reference or spatial location, representations all thought to be unaffected by IFG lesions. (iv) Training focussed on promoting flexible retrieval of conceptual knowledge showed generalization to untrained semantic and episodic tasks in some individuals; in others repetition of specific associations gave rise to inflexible retrieval and overgeneralization of trained associations during episodic tasks. Although the neuroanatomical specificity of neuropsychology is limited, this thesis provides evidence that shared mechanisms support the controlled retrieval of episodic and semantic memory

    Detection-by-Localization: Maintenance-Free Change Object Detector

    Full text link
    Recent researches demonstrate that self-localization performance is a very useful measure of likelihood-of-change (LoC) for change detection. In this paper, this "detection-by-localization" scheme is studied in a novel generalized task of object-level change detection. In our framework, a given query image is segmented into object-level subimages (termed "scene parts"), which are then converted to subimage-level pixel-wise LoC maps via the detection-by-localization scheme. Our approach models a self-localization system as a ranking function, outputting a ranked list of reference images, without requiring relevance score. Thanks to this new setting, we can generalize our approach to a broad class of self-localization systems. Our ranking based self-localization model allows to fuse self-localization results from different modalities via an unsupervised rank fusion derived from a field of multi-modal information retrieval (MMR).Comment: 7 pages, 3 figures, Technical repor

    Interventional programmes to improve cognition during healthy and pathological ageing: Cortical modulations and evidence for brain plasticity

    Get PDF
    Available online 06 March 2018A growing body of evidence suggests that healthy elderly individuals and patients with Alzheimer’s disease retain an important potential for neuroplasticity. This review summarizes studies investigating the modulation of neural activity and structural brain integrity in response to interventions involving cognitive training, physical exercise and non-invasive brain stimulation in healthy elderly and cognitively impaired subjects (including patients with mild cognitive impairment (MCI) and Alzheimer’s disease). Moreover, given the clinical relevance of neuroplasticity, we discuss how evidence for neuroplasticity can be inferred from the functional and structural brain changes observed after implementing these interventions. We emphasize that multimodal programmes, which combine several types of interventions, improve cognitive function to a greater extent than programmes that use a single interventional approach. We suggest specific methods for weighting the relative importance of cognitive training, physical exercise and non-invasive brain stimulation according to the functional and structural state of the brain of the targeted subject to maximize the cognitive improvements induced by multimodal programmes.This study was funded by the European Commission Marie-Skłodowska Curie Actions, Individual Fellowships; 655423-NIBSAD, Italian Ministry of HealthGR-2011-02349998, and Galician government (Postdoctoral Grants Plan I2C 2011-2015)

    Learning by doing? : Gesture-based word-learning and its neural correlates in healthy volunteers and patients with residual aphasia

    No full text

    Language learning using speech to image retrieval

    Get PDF
    Humans learn language by interaction with their environment and listening to other humans. It should also be possible for computational models to learn language directly from speech but so far most approaches require text. We improve on existing neural network approaches to create visually grounded embeddings for spoken utterances. Using a combination of a multi-layer GRU, importance sampling, cyclic learning rates, ensembling and vectorial self-attention our results show a remarkable increase in image-caption retrieval performance over previous work. Furthermore, we investigate which layers in the model learn to recognise words in the input. We find that deeper network layers are better at encoding word presence, although the final layer has slightly lower performance. This shows that our visually grounded sentence encoder learns to recognise words from the input even though it is not explicitly trained for word recognition
    • …
    corecore