3,985 research outputs found

    An information assistant system for the prevention of tunnel vision in crisis management

    Get PDF
    In the crisis management environment, tunnel vision is a set of bias in decision makers’ cognitive process which often leads to incorrect understanding of the real crisis situation, biased perception of information, and improper decisions. The tunnel vision phenomenon is a consequence of both the challenges in the task and the natural limitation in a human being’s cognitive process. An information assistant system is proposed with the purpose of preventing tunnel vision. The system serves as a platform for monitoring the on-going crisis event. All information goes through the system before arrives at the user. The system enhances the data quality, reduces the data quantity and presents the crisis information in a manner that prevents or repairs the user’s cognitive overload. While working with such a system, the users (crisis managers) are expected to be more likely to stay aware of the actual situation, stay open minded to possibilities, and make proper decisions

    Designing multimodal interaction for the visually impaired

    Get PDF
    Although multimodal computer input is believed to have advantages over unimodal input, little has been done to understand how to design a multimodal input mechanism to facilitate visually impaired users\u27 information access. This research investigates sighted and visually impaired users\u27 multimodal interaction choices when given an interaction grammar that supports speech and touch input modalities. It investigates whether task type, working memory load, or prevalence of errors in a given modality impact a user\u27s choice. Theories in human memory and attention are used to explain the users\u27 speech and touch input coordination. Among the abundant findings from this research, the following are the most important in guiding system design: (1) Multimodal input is likely to be used when it is available. (2) Users select input modalities based on the type of task undertaken. Users prefer touch input for navigation operations, but speech input for non-navigation operations. (3) When errors occur, users prefer to stay in the failing modality, instead of switching to another modality for error correction. (4) Despite the common multimodal usage patterns, there is still a high degree of individual differences in modality choices. Additional findings include: (I) Modality switching becomes more prevalent when lower working memory and attentional resources are required for the performance of other concurrent tasks. (2) Higher error rates increases modality switching but only under duress. (3) Training order affects modality usage. Teaching a modality first versus second increases the use of this modality in users\u27 task performance. In addition to discovering multimodal interaction patterns above, this research contributes to the field of human computer interaction design by: (1) presenting a design of an eyes-free multimodal information browser, (2) presenting a Wizard of Oz method for working with visually impaired users in order to observe their multimodal interaction. The overall contribution of this work is that of one of the early investigations into how speech and touch might be combined into a non-visual multimodal system that can effectively be used for eyes-free tasks

    Computational and Robotic Models of Early Language Development: A Review

    Get PDF
    We review computational and robotics models of early language learning and development. We first explain why and how these models are used to understand better how children learn language. We argue that they provide concrete theories of language learning as a complex dynamic system, complementing traditional methods in psychology and linguistics. We review different modeling formalisms, grounded in techniques from machine learning and artificial intelligence such as Bayesian and neural network approaches. We then discuss their role in understanding several key mechanisms of language development: cross-situational statistical learning, embodiment, situated social interaction, intrinsically motivated learning, and cultural evolution. We conclude by discussing future challenges for research, including modeling of large-scale empirical data about language acquisition in real-world environments. Keywords: Early language learning, Computational and robotic models, machine learning, development, embodiment, social interaction, intrinsic motivation, self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J. Horst and J. von Koss Torkildsen, Routledg

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

    A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System

    Full text link
    Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two critical components of every conversational system that handles the task of understanding the user by capturing the necessary information in the form of slots and generating an appropriate response in accordance with the extracted information. Recently, dialogue systems integrated with complementary information such as images, audio, or video have gained immense popularity. In this work, we propose an end-to-end framework with the capability to extract necessary slot values from the utterance and generate a coherent response, thereby assisting the user to achieve their desired goals in a multimodal dialogue system having both textual and visual information. The task of extracting the necessary information is dependent not only on the text but also on the visual cues present in the dialogue. Similarly, for the generation, the previous dialog context comprising multimodal information is significant for providing coherent and informative responses. We employ a multimodal hierarchical encoder using pre-trained DialoGPT and also exploit the knowledge base (Kb) to provide a stronger context for both the tasks. Finally, we design a slot attention mechanism to focus on the necessary information in a given utterance. Lastly, a decoder generates the corresponding response for the given dialogue context and the extracted slot values. Experimental results on the Multimodal Dialogue Dataset (MMD) show that the proposed framework outperforms the baselines approaches in both the tasks. The code is available at https://github.com/avinashsai/slot-gpt.Comment: Published in the journal Multimedia Tools and Application

    DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models

    Full text link
    The recent progress in diffusion-based text-to-image generation models has significantly expanded generative capabilities via conditioning the text descriptions. However, since relying solely on text prompts is still restrictive for fine-grained customization, we aim to extend the boundaries of conditional generation to incorporate diverse types of modalities, e.g., sketch, box, and style embedding, simultaneously. We thus design a multimodal text-to-image diffusion model, coined as DiffBlender, that achieves the aforementioned goal in a single model by training only a few small hypernetworks. DiffBlender facilitates a convenient scaling of input modalities, without altering the parameters of an existing large-scale generative model to retain its well-established knowledge. Furthermore, our study sets new standards for multimodal generation by conducting quantitative and qualitative comparisons with existing approaches. By diversifying the channels of conditioning modalities, DiffBlender faithfully reflects the provided information or, in its absence, creates imaginative generation.Comment: 18 pages, 16 figures, and 3 table

    Multimodal access to social media services

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto, Microsoft Language Development Center. 201
    • …
    corecore