121,752 research outputs found

    ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation

    Full text link
    Image-grounded dialogue systems benefit greatly from integrating visual information, resulting in high-quality response generation. However, current models struggle to effectively utilize such information in zero-resource scenarios, mainly due to the disparity between image and text modalities. To overcome this challenge, we propose an innovative multimodal framework, called ZRIGF, which assimilates image-grounded information for dialogue generation in zero-resource situations. ZRIGF implements a two-stage learning strategy, comprising contrastive pre-training and generative pre-training. Contrastive pre-training includes a text-image matching module that maps images and texts into a unified encoded vector space, along with a text-assisted masked image modeling module that preserves pre-training visual features and fosters further multimodal feature alignment. Generative pre-training employs a multimodal fusion module and an information transfer module to produce insightful responses based on harmonized multimodal representations. Comprehensive experiments conducted on both text-based and image-grounded dialogue datasets demonstrate ZRIGF's efficacy in generating contextually pertinent and informative responses. Furthermore, we adopt a fully zero-resource scenario in the image-grounded dialogue dataset to demonstrate our framework's robust generalization capabilities in novel domains. The code is available at https://github.com/zhangbo-nlp/ZRIGF.Comment: ACM Multimedia 2023 Accpeted, Repo: https://github.com/zhangbo-nlp/ZRIG

    A Knowledge-Grounded Multimodal Search-Based Conversational Agent

    Full text link
    Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB

    The relationship between IR and multimedia databases

    Get PDF
    Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud \ud Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud \ud Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud \ud First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud \ud Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud \ud Third, we add the functionality to process the users' relevance feedback.\ud \ud We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud \ud We conclude with an outline for implementation of miRRor on top of the Monet extensible database system

    Future scenarios to inspire innovation

    Get PDF
    In recent years and accelerated by the economic and financial crisis, complex global issues have moved to the forefront of policy making. These grand challenges require policy makers to address a variety of interrelated issues, which are built upon yet uncoordinated and dispersed bodies of knowledge. Due to the social dynamics of innovation, new socio-technical subsystems are emerging, however there is lack of exploitation of innovative solutions. In this paper we argue that issues of how knowledge is represented can have a part in this lack of exploitation. For example, when drivers of change are not only multiple but also mutable, it is not sensible to extrapolate the future from data and relationships of the past. This paper investigates ways in which futures thinking can be used as a tool for inspiring actions and structures that address the grand challenges. By analysing several scenario cases, elements of good practice and principles on how to strengthen innovation systems through future scenarios are identified. This is needed because innovation itself needs to be oriented along more sustainable pathways enabling transformations of socio-technical systems

    Multimodal agent interfaces and system architectures for health and fitness companions

    Get PDF
    Multimodal conversational spoken dialogues using physical and virtual agents provide a potential interface to motivate and support users in the domain of health and fitness. In this paper we present how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings. In particular, we focus on different forms of multimodality and system architectures for such interfaces

    Media literacy at all levels: making the humanities more inclusive

    Full text link
    The decline of the humanities, combined with the arrival of students focused on science, technology, engineering, and mathematics (STEM), represent an opportunity for the development of innovative approaches to teaching languages and literatures. Expanding the instructional focus from traditional humanities students, who are naturally more text-focused, to address the needs of more application-oriented STEM learners ensures that language instructors prepare all students to become analytical and critical consumers and producers of digital media. Training students to question motives both in their own and authentic media messages and to justify their own interpretations results in more sophisticated second language (L2) communication. Even where institutional structures impede comprehensive curriculum reform, individual instructors can integrate media literacy training into their own classes. Tis article demonstrates ways of reaching and retaining larger numbers of students at all levels—if necessary, one course at a time.Published versio

    Museum Experience Design: A Modern Storytelling Methodology

    Get PDF
    In this paper we propose a new direction for design, in the context of the theme “Next Digital Technologies in Arts and Culture”, by employing modern methods based on Interaction Design, Interactive Storytelling and Artificial Intelligence. Focusing on Cultural Heritage, we propose a new paradigm for Museum Experience Design, facilitating on the one hand traditional visual and multimedia communication and, on the other, a new type of interaction with artefacts, in the form of a Storytelling Experience. Museums are increasingly being transformed into hybrid spaces, where virtual (digital) information coexists with tangible artefacts. In this context, “Next Digital Technologies” play a new role, providing methods to increase cultural accessibility and enhance experience. Not only is the goal to convey stories hidden inside artefacts, as well as items or objects connected to them, but it is also to pave the way for the creation of new ones through an interactive museum experience that continues after the museum visit ends. Social sharing, in particular, can greatly increase the value of dissemination

    Conceptual spatial representations for indoor mobile robots

    Get PDF
    We present an approach for creating conceptual representations of human-made indoor environments using mobile robots. The concepts refer to spatial and functional properties of typical indoor environments. Following findings in cognitive psychology, our model is composed of layers representing maps at different levels of abstraction. The complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition. The system also incorporates a linguistic framework that actively supports the map acquisition process, and which is used for situated dialogue. Finally, we discuss the capabilities of the integrated system
    corecore