321,173 research outputs found
MIRIAM: A Multimodal Chat-Based Interface for Autonomous Systems
We present MIRIAM (Multimodal Intelligent inteRactIon for Autonomous
systeMs), a multimodal interface to support situation awareness of autonomous
vehicles through chat-based interaction. The user is able to chat about the
vehicle's plan, objectives, previous activities and mission progress. The
system is mixed initiative in that it pro-actively sends messages about key
events, such as fault warnings. We will demonstrate MIRIAM using SeeByte's
SeeTrack command and control interface and Neptune autonomy simulator.Comment: 2 pages, ICMI'17, 19th ACM International Conference on Multimodal
Interaction, November 13-17 2017, Glasgow, U
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Multimodal search-based dialogue is a challenging new task: It extends
visually grounded question answering systems into multi-turn conversations with
access to an external database. We address this new challenge by learning a
neural response generation system from the recently released Multimodal
Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded
multimodal conversational model where an encoded knowledge base (KB)
representation is appended to the decoder input. Our model substantially
outperforms strong baselines in terms of text-based similarity measures (over 9
BLEU points, 3 of which are solely due to the use of additional information
from the KB
OSU Multimodal Machine Translation System Report
This paper describes Oregon State University's submissions to the shared
WMT'17 task "multimodal translation task I". In this task, all the sentence
pairs are image captions in different languages. The key difference between
this task and conventional machine translation is that we have corresponding
images as additional information for each sentence pair. In this paper, we
introduce a simple but effective system which takes an image shared between
different languages, feeding it into the both encoding and decoding side. We
report our system's performance for English-French and English-German with
Flickr30K (in-domain) and MSCOCO (out-of-domain) datasets. Our system achieves
the best performance in TER for English-German for MSCOCO dataset.Comment: 5, WMT 201
Multimodal Sparse Coding for Event Detection
Unsupervised feature learning methods have proven effective for
classification tasks based on a single modality. We present multimodal sparse
coding for learning feature representations shared across multiple modalities.
The shared representations are applied to multimedia event detection (MED) and
evaluated in comparison to unimodal counterparts, as well as other feature
learning methods such as GMM supervectors and sparse RBM. We report the
cross-validated classification accuracy and mean average precision of the MED
system trained on features learned from our unimodal and multimodal settings
for a subset of the TRECVID MED 2014 dataset.Comment: Multimodal Machine Learning Workshop at NIPS 201
Recommended from our members
A multimodal restaurant finder for semantic web
Multimodal dialogue systems provide multiple modalities in the form of speech, mouse clicking, drawing or touch that can enhance human-computer interaction. However, one of the drawbacks of the existing multimodal systems is that they are highly domain-specific and they do not allow information to be shared across different providers. In this paper, we propose a semantic multimodal system, called Semantic Restaurant Finder, for the Semantic Web in which the restaurant information in different city/country/language are constructed as ontologies to allow the information to be sharable. From the Semantic Restaurant Finder, users can make use of the semantic restaurant knowledge distributed from different locations on the Internet to find the desired restaurants
DOLPHIN: the design and initial evaluation of multimodal focus and context
In this paper we describe a new focus and context visualisation technique called multimodal focus and context. This technique uses a hybrid visual and spatialised audio display space to overcome the limited visual displays of mobile devices. We demonstrate this technique by applying it to maps of theme parks. We present the results of an experiment comparing multimodal focus and context to a purely visual display technique. The results showed that neither system was significantly better than the other. We believe that this is due to issues involving the perception of multiple structured audio sources
Bimodal Feedback for In-car Mid-air Gesture Interaction
This demonstration showcases novel multimodal feedback designs for in-car mid-air gesture interaction. It explores the potential of multimodal feedback types for mid-air gestures in cars and how these can reduce eyes-off-the-road time thus make driving safer. We will show four different bimodal feedback combinations to provide effective information about interaction with systems in a car. These feedback techniques are visual-auditory, auditory-ambient (peripheral vision), ambient-tactile, and tactile-auditory. Users can interact with the system after a short introduction, creating an exciting opportunity to deploy these displays in cars in the future
Affective games:a multimodal classification system
Affective gaming is a relatively new field of research that exploits human emotions to influence gameplay for an enhanced player experience. Changes in player’s psychology reflect on their behaviour and physiology, hence recognition of such variation is a core element in affective games. Complementary sources of affect offer more reliable recognition, especially in contexts where one modality is partial or unavailable. As a multimodal recognition system, affect-aware games are subject to the practical difficulties met by traditional trained classifiers. In addition, inherited game-related challenges in terms of data collection and performance arise while attempting to sustain an acceptable level of immersion. Most existing scenarios employ sensors that offer limited freedom of movement resulting in less realistic experiences. Recent advances now offer technology that allows players to communicate more freely and naturally with the game, and furthermore, control it without the use of input devices. However, the affective game industry is still in its infancy and definitely needs to catch up with the current life-like level of adaptation provided by graphics and animation
- …
