Search CORE

11 research outputs found

Multimodal Grounding for Language Processing

Author: Beinborn Lisa
Botschen Teresa
Gurevych Iryna
Publication venue
Publication date: 01/01/2018
Field of study

This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

arXiv.org e-Print Archive

TUbiblio

VU Research Portal

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Predicting Perceived Age: Both Language Ability and Appearance are Important

Author: Egan Tyler
Kennington Casey
Marvasti Ariel
Plane Sarah
Publication venue: 'IUScholarWorks'
Publication date: 01/01/2018
Field of study

When interacting with robots in a situated spoken dialogue setting, human dialogue partners tend to assign anthropomorphic and social characteristics to those robots. In this paper, we explore the age and educational level that human dialogue partners assign to three different robotic systems, including an un-embodied spoken dialogue system. We found that how a robot speaks is as important to human perceptions as the way the robot looks. Using the data from our experiment, we derived prosodic, emotional, and linguistic features from the participants to train and evaluate a classifier that predicts perceived intelligence, age, and education level

Crossref

Boise State University - ScholarWorks

Clinical Text Prediction with Numerically Grounded Conditional Language Models

Author: Petersen SE
Riedel S
Spithourakis GP
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/11/2016
Field of study

Assisted text input techniques can save time and effort and improve text quality. In this paper, we investigate how grounded and conditional extensions to standard neural language models can bring improvements in the tasks of word prediction and completion. These extensions incorporate a structured knowledge base and numerical values from the text into the context used to predict the next word. Our automated evaluation on a clinical dataset shows extended models significantly outperform standard models. Our best system uses both conditioning and grounding, because of their orthogonal benefits. For word prediction with a list of 5 suggestions, it improves recall from 25.03% to 71.28% and for word completion it improves keystroke savings from 34.35% to 44.81%, where theoretical bound for this dataset is 58.78%. We also perform a qualitative investigation of how models with lower perplexity occasionally fare better at the tasks. We found that at test time numbers have more influence on the document level than on individual word probabilities

UCL Discovery

Clinical Text Prediction with Numerically Grounded Conditional Language Models

Author: Petersen SE
Riedel S
Spithourakis GP
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 20/10/2016
Field of study

arXiv.org e-Print Archive

UCL Discovery

Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games

Author: Bastianelli Emanuele
Bisk Yonatan
Konstas Ioannis
Lemon Oliver
Suglia Alessandro
Vanzo Andrea
Vergari Antonio
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 05/11/2020
Field of study

In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time. This provides an unnatural performance advantage when categories at inference time match those at training time, and it causes models to fail in more realistic "zero-shot" scenarios where out-of-domain object categories are involved. To overcome this issue, we introduce a novel "imagination" module based on Regularized Auto-Encoders, that learns context-aware and category-aware latent embeddings without relying on category labels at inference time. Our imagination module outperforms state-of-the-art competitors by 8.26% gameplay accuracy in the CompGuessWhat?! zero-shot scenario (Suglia et al., 2020), and it improves the Oracle and Guesser accuracy by 2.08% and 12.86% in the GuessWhat?! benchmark, when no gold categories are available at inference time. The imagination module also boosts reasoning about object properties and attributes.Comment: Accepted to the International Conference on Computational Linguistics (COLING) 202

arXiv.org e-Print Archive

Heriot Watt Pure

Edinburgh Research Explorer

Multimodal Grounding for Language Processing

Author: Beinborn L.
Botschen T.
Gurevych I.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications

Recommended from our members

Vision and Feature Norms: Improving automatic feature norm learning through cross-modal maps

Author: Bulat L
Clark S
Kiela D
Publication venue: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Publication date: 01/01/2016
Field of study

Property norms have the potential to aid a wide range of semantic tasks, provided that they can be obtained for large numbers of concepts. Recent work has focused on text as the main source of information for automatic property extraction. In this paper we examine property norm prediction from visual, rather than textual, data, using cross-modal maps learnt between property norm and visual spaces. We also investigate the importance of having a complete feature norm dataset, for both training and testing. Finally, we evaluate how these datasets and cross-modal maps can be used in an image retrieval task.LB is supported by an EPSRC Doctoral Training Grant. DK is supported by EPSRC grant EP/I037512/1. SC is supported by ERC Starting Grant DisCoTex (306920) and EPSRC grant EP/I037512/1

Apollo (Cambridge)

How direct is the link between words and images?

Author: Baayen Harald
Heitmeier Maria
Lensch Hendrik P. A.
Shafaei-Bajestan Elnaz
Shahmohammadi Hassan
Publication venue
Publication date: 31/10/2023
Field of study

Current word embedding models despite their success, still suffer from their lack of grounding in the real world. In this line of research, Gunther et al. 2022 proposed a behavioral experiment to investigate the relationship between words and images. In their setup, participants were presented with a target noun and a pair of images, one chosen by their model and another chosen randomly. Participants were asked to select the image that best matched the target noun. In most cases, participants preferred the image selected by the model. Gunther et al., therefore, concluded the possibility of a direct link between words and embodied experience. We took their experiment as a point of departure and addressed the following questions. 1. Apart from utilizing visually embodied simulation of given images, what other strategies might subjects have used to solve this task? To what extent does this setup rely on visual information from images? Can it be solved using purely textual representations? 2. Do current visually grounded embeddings explain subjects' selection behavior better than textual embeddings? 3. Does visual grounding improve the semantic representations of both concrete and abstract words? To address these questions, we designed novel experiments by using pre-trained textual and visually grounded word embeddings. Our experiments reveal that subjects' selection behavior is explained to a large extent based on purely text-based embeddings and word-based similarities, suggesting a minor involvement of active embodied experiences. Visually grounded embeddings offered modest advantages over textual embeddings only in certain cases. These findings indicate that the experiment by Gunther et al. may not be well suited for tapping into the perceptual experience of participants, and therefore the extent to which it measures visually grounded knowledge is unclear.Comment: Accepted in the Mental Lexicon Journal: https://benjamins.com/catalog/m

arXiv.org e-Print Archive