125,503 research outputs found
ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation
Image-grounded dialogue systems benefit greatly from integrating visual
information, resulting in high-quality response generation. However, current
models struggle to effectively utilize such information in zero-resource
scenarios, mainly due to the disparity between image and text modalities. To
overcome this challenge, we propose an innovative multimodal framework, called
ZRIGF, which assimilates image-grounded information for dialogue generation in
zero-resource situations. ZRIGF implements a two-stage learning strategy,
comprising contrastive pre-training and generative pre-training. Contrastive
pre-training includes a text-image matching module that maps images and texts
into a unified encoded vector space, along with a text-assisted masked image
modeling module that preserves pre-training visual features and fosters further
multimodal feature alignment. Generative pre-training employs a multimodal
fusion module and an information transfer module to produce insightful
responses based on harmonized multimodal representations. Comprehensive
experiments conducted on both text-based and image-grounded dialogue datasets
demonstrate ZRIGF's efficacy in generating contextually pertinent and
informative responses. Furthermore, we adopt a fully zero-resource scenario in
the image-grounded dialogue dataset to demonstrate our framework's robust
generalization capabilities in novel domains. The code is available at
https://github.com/zhangbo-nlp/ZRIGF.Comment: ACM Multimedia 2023 Accpeted, Repo:
https://github.com/zhangbo-nlp/ZRIG
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Multimodal search-based dialogue is a challenging new task: It extends
visually grounded question answering systems into multi-turn conversations with
access to an external database. We address this new challenge by learning a
neural response generation system from the recently released Multimodal
Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded
multimodal conversational model where an encoded knowledge base (KB)
representation is appended to the decoder input. Our model substantially
outperforms strong baselines in terms of text-based similarity measures (over 9
BLEU points, 3 of which are solely due to the use of additional information
from the KB
The relationship between IR and multimedia databases
Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud
\ud
Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud
\ud
Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud
\ud
First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud
\ud
Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud
\ud
Third, we add the functionality to process the users' relevance feedback.\ud
\ud
We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud
\ud
We conclude with an outline for implementation of miRRor on top of the Monet extensible database system
Future scenarios to inspire innovation
In recent years and accelerated by the economic and financial crisis, complex global issues have moved to the forefront of policy making. These grand challenges require policy makers to address a variety of interrelated issues, which are built upon yet uncoordinated and dispersed bodies of knowledge. Due to the social dynamics of innovation, new socio-technical subsystems are emerging, however there is lack of exploitation of innovative solutions. In this paper we argue that issues of how knowledge is represented can have a part in this lack of exploitation. For example, when drivers of change are not only multiple but also mutable, it is not sensible to extrapolate the future from data and relationships of the past. This paper investigates ways in which futures thinking can be used as a tool for inspiring actions and structures that address the grand challenges. By analysing several scenario cases, elements of good practice and principles on how to strengthen innovation systems through future scenarios are identified. This is needed because innovation itself needs to be oriented along more sustainable pathways enabling transformations of socio-technical systems
Multimodal agent interfaces and system architectures for health and fitness companions
Multimodal conversational spoken dialogues using physical and virtual agents provide a potential interface to motivate and support users in the domain of health and fitness. In this paper we present how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings. In particular, we focus on different forms of multimodality and system architectures for such interfaces
Media literacy at all levels: making the humanities more inclusive
The decline of the humanities, combined with the arrival of students focused
on science, technology, engineering, and mathematics (STEM), represent
an opportunity for the development of innovative approaches to teaching
languages and literatures. Expanding the instructional focus from traditional
humanities students, who are naturally more text-focused, to address the needs
of more application-oriented STEM learners ensures that language instructors
prepare all students to become analytical and critical consumers and producers
of digital media. Training students to question motives both in their own and
authentic media messages and to justify their own interpretations results in more
sophisticated second language (L2) communication. Even where institutional
structures impede comprehensive curriculum reform, individual instructors can
integrate media literacy training into their own classes. Tis article demonstrates
ways of reaching and retaining larger numbers of students at all levels—if necessary,
one course at a time.Published versio
Museum Experience Design: A Modern Storytelling Methodology
In this paper we propose a new direction for design, in the context of the theme “Next Digital Technologies in Arts and Culture”, by employing modern methods based on Interaction Design, Interactive Storytelling and Artificial Intelligence. Focusing on Cultural Heritage, we propose a new paradigm for Museum Experience Design, facilitating on the one hand traditional visual and multimedia communication and, on the other, a new type of interaction with artefacts, in the form of a Storytelling Experience. Museums are increasingly being transformed into hybrid spaces, where virtual (digital) information coexists with tangible artefacts. In this context, “Next Digital Technologies” play a new role, providing methods to increase cultural accessibility and enhance experience. Not only is the goal to convey stories hidden inside artefacts, as well as items or objects connected to them, but it is also to pave the way for the creation of new ones through an interactive museum experience that continues after the museum visit ends. Social sharing, in particular, can greatly increase the value of dissemination
Conceptual spatial representations for indoor mobile robots
We present an approach for creating conceptual representations of human-made indoor environments using mobile
robots. The concepts refer to spatial and functional properties of typical indoor environments. Following findings
in cognitive psychology, our model is composed of layers representing maps at different levels of abstraction. The
complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition.
The system also incorporates a linguistic framework that actively supports the map acquisition process, and which
is used for situated dialogue. Finally, we discuss the capabilities of the integrated system
- …