2,315 research outputs found
Multimodal Grounding for Language Processing
This survey discusses how recent developments in multimodal processing
facilitate conceptual grounding of language. We categorize the information flow
in multimodal processing with respect to cognitive models of human information
processing and analyze different methods for combining multimodal
representations. Based on this methodological inventory, we discuss the benefit
of multimodal grounding for a variety of language processing tasks and the
challenges that arise. We particularly focus on multimodal grounding of verbs
which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference
of Computational Linguistics. Please refer to this version for citations:
https://www.aclweb.org/anthology/papers/C/C18/C18-1197
A discriminative approach to grounded spoken language understanding in interactive robotics
Spoken Language Understanding in Interactive Robotics provides computational models of human-machine communication based on the vocal input. However, robots operate in specific environments and the correct interpretation of the spoken sentences depends on the physical, cognitive and linguistic aspects triggered by the operational environment. Grounded language processing should exploit both the physical constraints of the context as well as knowledge assumptions of the robot. These include the subjective perception of the environment that explicitly affects linguistic reasoning. In this work, a standard linguistic pipeline for semantic parsing is extended toward a form of perceptually informed natural language processing that combines discriminative learning and distributional semantics. Empirical results achieve up to a 40% of relative error reduction
Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning
Distributional word representation methods exploit word co-occurrences to
build compact vector encodings of words. While these representations enjoy
widespread use in modern natural language processing, it is unclear whether
they accurately encode all necessary facets of conceptual meaning. In this
paper, we evaluate how well these representations can predict perceptual and
conceptual features of concrete concepts, drawing on two semantic norm datasets
sourced from human participants. We find that several standard word
representations fail to encode many salient perceptual features of concepts,
and show that these deficits correlate with word-word similarity prediction
errors. Our analyses provide motivation for grounded and embodied language
learning approaches, which may help to remedy these deficits.Comment: Accepted at RoboNLP 201
Combining Language and Vision with a Multimodal Skip-gram Model
We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual
information into account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM)
build vector-based word representations by learning to predict linguistic
contexts in text corpora. However, for a restricted set of words, the models
are also exposed to visual representations of the objects they denote
(extracted from natural images), and must predict linguistic and visual
features jointly. The MMSKIP-GRAM models achieve good performance on a variety
of semantic benchmarks. Moreover, since they propagate visual information to
all words, we use them to improve image labeling and retrieval in the zero-shot
setup, where the test concepts are never seen during model training. Finally,
the MMSKIP-GRAM models discover intriguing visual properties of abstract words,
paving the way to realistic implementations of embodied theories of meaning.Comment: accepted at NAACL 2015, camera ready version, 11 page
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Classification systems offer a microcosm of issues in conceptual processing: A commentary on Kemmerer (2016)
This is a commentary on Kemmerer (2016), Categories of Object Concepts Across Languages and Brains: The Relevance of Nominal Classification Systems to Cognitive Neuroscience, DOI: 10.1080/23273798.2016.1198819
- …