162,016 research outputs found
Multimodal Grounding for Language Processing
This survey discusses how recent developments in multimodal processing
facilitate conceptual grounding of language. We categorize the information flow
in multimodal processing with respect to cognitive models of human information
processing and analyze different methods for combining multimodal
representations. Based on this methodological inventory, we discuss the benefit
of multimodal grounding for a variety of language processing tasks and the
challenges that arise. We particularly focus on multimodal grounding of verbs
which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference
of Computational Linguistics. Please refer to this version for citations:
https://www.aclweb.org/anthology/papers/C/C18/C18-1197
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
Humor is a unique and creative communicative behavior displayed during social
interactions. It is produced in a multimodal manner, through the usage of words
(text), gestures (vision) and prosodic cues (acoustic). Understanding humor
from these three modalities falls within boundaries of multimodal language; a
recent research trend in natural language processing that models natural
language as it happens in face-to-face communication. Although humor detection
is an established research area in NLP, in a multimodal context it is an
understudied area. This paper presents a diverse multimodal dataset, called
UR-FUNNY, to open the door to understanding multimodal language used in
expressing humor. The dataset and accompanying studies, present a framework in
multimodal humor detection for the natural language processing community.
UR-FUNNY is publicly available for research
The Multimodal Experience of Art
The aim of this paper is to argue that our experience of artworks is normally multimodal. It is the
result of perceptual processing in more than one sense modality. In other words, multimodal experience
of art is not the exception; it is the rule. I use the example of music in order to demonstrate the various
ways in which the visual sense modality influences the auditory processing of music and conclude that
this should make us look more closely at our practices of engaging with artworks
Multi-modal Image Processing based on Coupled Dictionary Learning
In real-world scenarios, many data processing problems often involve
heterogeneous images associated with different imaging modalities. Since these
multimodal images originate from the same phenomenon, it is realistic to assume
that they share common attributes or characteristics. In this paper, we propose
a multi-modal image processing framework based on coupled dictionary learning
to capture similarities and disparities between different image modalities. In
particular, our framework can capture favorable structure similarities across
different image modalities such as edges, corners, and other elementary
primitives in a learned sparse transform domain, instead of the original pixel
domain, that can be used to improve a number of image processing tasks such as
denoising, inpainting, or super-resolution. Practical experiments demonstrate
that incorporating multimodal information using our framework brings notable
benefits.Comment: SPAWC 2018, 19th IEEE International Workshop On Signal Processing
Advances In Wireless Communication
Conceptual Frameworks for Multimodal Social Signal Processing
This special issue is about a research area which is developing rapidly. Pentland gave it a name which has become widely used, âSocial Signal Processingâ (SSP for short), and his phrase provides the title of a European project, SSPnet, which has a brief to consolidate the area. The challenge that Pentland highlighted was understanding the nonlinguistic signals that serve as the basis for âsubconscious discussions between humans about relationships, resources, risks, and rewardsâ. He identified it as an area where computational research had made interesting progress, and could usefully make more
Eyes and ears together: new task for multimodal spoken content analysis
Human speech processing is often a multimodal process combining
audio and visual processing. Eyes and Ears Together proposes two
benchmark multimodal speech processing tasks: (1) multimodal automatic speech recognition (ASR) and (2) multimodal co-reference
resolution on the spoken multimedia. These tasks are motivated by
our desire to address the difficulties of ASR for multimedia spoken
content. We review prior work on the integration of multimodal
signals into speech processing for multimedia data, introduce a
multimedia dataset for our proposed tasks, and outline these tasks
- âŠ