321,173 research outputs found

    MIRIAM: A Multimodal Chat-Based Interface for Autonomous Systems

    Full text link
    We present MIRIAM (Multimodal Intelligent inteRactIon for Autonomous systeMs), a multimodal interface to support situation awareness of autonomous vehicles through chat-based interaction. The user is able to chat about the vehicle's plan, objectives, previous activities and mission progress. The system is mixed initiative in that it pro-actively sends messages about key events, such as fault warnings. We will demonstrate MIRIAM using SeeByte's SeeTrack command and control interface and Neptune autonomy simulator.Comment: 2 pages, ICMI'17, 19th ACM International Conference on Multimodal Interaction, November 13-17 2017, Glasgow, U

    A Knowledge-Grounded Multimodal Search-Based Conversational Agent

    Full text link
    Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB

    OSU Multimodal Machine Translation System Report

    Full text link
    This paper describes Oregon State University's submissions to the shared WMT'17 task "multimodal translation task I". In this task, all the sentence pairs are image captions in different languages. The key difference between this task and conventional machine translation is that we have corresponding images as additional information for each sentence pair. In this paper, we introduce a simple but effective system which takes an image shared between different languages, feeding it into the both encoding and decoding side. We report our system's performance for English-French and English-German with Flickr30K (in-domain) and MSCOCO (out-of-domain) datasets. Our system achieves the best performance in TER for English-German for MSCOCO dataset.Comment: 5, WMT 201

    Multimodal Sparse Coding for Event Detection

    Full text link
    Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared representations are applied to multimedia event detection (MED) and evaluated in comparison to unimodal counterparts, as well as other feature learning methods such as GMM supervectors and sparse RBM. We report the cross-validated classification accuracy and mean average precision of the MED system trained on features learned from our unimodal and multimodal settings for a subset of the TRECVID MED 2014 dataset.Comment: Multimodal Machine Learning Workshop at NIPS 201

    DOLPHIN: the design and initial evaluation of multimodal focus and context

    Get PDF
    In this paper we describe a new focus and context visualisation technique called multimodal focus and context. This technique uses a hybrid visual and spatialised audio display space to overcome the limited visual displays of mobile devices. We demonstrate this technique by applying it to maps of theme parks. We present the results of an experiment comparing multimodal focus and context to a purely visual display technique. The results showed that neither system was significantly better than the other. We believe that this is due to issues involving the perception of multiple structured audio sources

    Bimodal Feedback for In-car Mid-air Gesture Interaction

    Get PDF
    This demonstration showcases novel multimodal feedback designs for in-car mid-air gesture interaction. It explores the potential of multimodal feedback types for mid-air gestures in cars and how these can reduce eyes-off-the-road time thus make driving safer. We will show four different bimodal feedback combinations to provide effective information about interaction with systems in a car. These feedback techniques are visual-auditory, auditory-ambient (peripheral vision), ambient-tactile, and tactile-auditory. Users can interact with the system after a short introduction, creating an exciting opportunity to deploy these displays in cars in the future

    Affective games:a multimodal classification system

    Get PDF
    Affective gaming is a relatively new field of research that exploits human emotions to influence gameplay for an enhanced player experience. Changes in player’s psychology reflect on their behaviour and physiology, hence recognition of such variation is a core element in affective games. Complementary sources of affect offer more reliable recognition, especially in contexts where one modality is partial or unavailable. As a multimodal recognition system, affect-aware games are subject to the practical difficulties met by traditional trained classifiers. In addition, inherited game-related challenges in terms of data collection and performance arise while attempting to sustain an acceptable level of immersion. Most existing scenarios employ sensors that offer limited freedom of movement resulting in less realistic experiences. Recent advances now offer technology that allows players to communicate more freely and naturally with the game, and furthermore, control it without the use of input devices. However, the affective game industry is still in its infancy and definitely needs to catch up with the current life-like level of adaptation provided by graphics and animation
    corecore