39,676 research outputs found
Reference resolution in multi-modal interaction: Preliminary observations
In this paper we present our research on multimodal interaction in and with virtual environments. The aim of this presentation is to emphasize the necessity to spend more research on reference resolution in multimodal contexts. In multi-modal interaction the human conversational partner can apply more than one modality in conveying his or her message to the environment in which a computer detects and interprets signals from different modalities. We show some naturally arising problems but do not give general solutions. Rather we decide to perform more detailed research on reference resolution in uni-modal contexts to obtain methods generalizable to multi-modal contexts. Since we try to build applications for a Dutch audience and since hardly any research has been done on reference resolution for Dutch, we give results on the resolution of anaphoric and deictic references in Dutch texts. We hope to be able to extend these results to our multimodal contexts later
Reference Resolution in Multi-modal Interaction: Position paper
In this position paper we present our research on multimodal interaction in and with virtual environments. The aim of this presentation is to emphasize the necessity to spend more research on reference resolution in multimodal contexts. In multi-modal interaction the human conversational partner can apply more than one modality in conveying his or her message to the environment in which a computer detects and interprets signals from different modalities. We show some naturally arising problems and how they are treated for different contexts. No generally applicable solutions are given
Recommended from our members
A multimodal restaurant finder for semantic web
Multimodal dialogue systems provide multiple modalities in the form of speech, mouse clicking, drawing or touch that can enhance human-computer interaction. However, one of the drawbacks of the existing multimodal systems is that they are highly domain-speciļ¬c and they do not allow information to be shared across different providers. In this paper, we propose a semantic multimodal system, called Semantic Restaurant Finder, for the Semantic Web in which the restaurant information in different city/country/language are constructed as ontologies to allow the information to be sharable. From the Semantic Restaurant Finder, users can make use of the semantic restaurant knowledge distributed from different locations on the Internet to ļ¬nd the desired restaurants
Crossmodal Attentive Skill Learner
This paper presents the Crossmodal Attentive Skill Learner (CASL), integrated
with the recently-introduced Asynchronous Advantage Option-Critic (A2OC)
architecture [Harb et al., 2017] to enable hierarchical reinforcement learning
across multiple sensory inputs. We provide concrete examples where the approach
not only improves performance in a single task, but accelerates transfer to new
tasks. We demonstrate the attention mechanism anticipates and identifies useful
latent features, while filtering irrelevant sensor modalities during execution.
We modify the Arcade Learning Environment [Bellemare et al., 2013] to support
audio queries, and conduct evaluations of crossmodal learning in the Atari 2600
game Amidar. Finally, building on the recent work of Babaeizadeh et al. [2017],
we open-source a fast hybrid CPU-GPU implementation of CASL.Comment: International Conference on Autonomous Agents and Multiagent Systems
(AAMAS) 2018, NIPS 2017 Deep Reinforcement Learning Symposiu
Force-imitated particle swarm optimization using the near-neighbor effect for locating multiple optima
Copyright @ Elsevier Inc. All rights reserved.Multimodal optimization problems pose a great challenge of locating multiple optima simultaneously in the search space to the particle swarm optimization (PSO) community. In this paper, the motion principle of particles in PSO is extended by using the near-neighbor effect in mechanical theory, which is a universal phenomenon in nature and society. In the proposed near-neighbor effect based force-imitated PSO (NN-FPSO) algorithm, each particle explores the promising regions where it resides under the composite forces produced by the ānear-neighbor attractorā and ānear-neighbor repellerā, which are selected from the set of memorized personal best positions and the current swarm based on the principles of āsuperior-and-nearerā and āinferior-and-nearerā, respectively. These two forces pull and push a particle to search for the nearby optimum. Hence, particles can simultaneously locate multiple optima quickly and precisely. Experiments are carried out to investigate the performance of NN-FPSO in comparison with a number of state-of-the-art PSO algorithms for locating multiple optima over a series of multimodal benchmark test functions. The experimental results indicate that the proposed NN-FPSO algorithm can efficiently locate multiple optima in multimodal fitness landscapes.This work was supported in part by the Key Program of National Natural Science Foundation (NNSF) of China under Grant 70931001, Grant 70771021, and Grant 70721001, the National Natural Science Foundation (NNSF) of China for Youth under Grant 61004121, Grant 70771021, the Science Fund for Creative Research Group of NNSF of China under Grant 60821063, the PhD Programs Foundation of Ministry of Education of China under Grant 200801450008, and in part by the Engineering and Physical Sciences Research Council (EPSRC) of UK under Grant EP/E060722/1 and Grant EP/E060722/2
Early Turn-taking Prediction with Spiking Neural Networks for Human Robot Collaboration
Turn-taking is essential to the structure of human teamwork. Humans are
typically aware of team members' intention to keep or relinquish their turn
before a turn switch, where the responsibility of working on a shared task is
shifted. Future co-robots are also expected to provide such competence. To that
end, this paper proposes the Cognitive Turn-taking Model (CTTM), which
leverages cognitive models (i.e., Spiking Neural Network) to achieve early
turn-taking prediction. The CTTM framework can process multimodal human
communication cues (both implicit and explicit) and predict human turn-taking
intentions in an early stage. The proposed framework is tested on a simulated
surgical procedure, where a robotic scrub nurse predicts the surgeon's
turn-taking intention. It was found that the proposed CTTM framework
outperforms the state-of-the-art turn-taking prediction algorithms by a large
margin. It also outperforms humans when presented with partial observations of
communication cues (i.e., less than 40% of full actions). This early prediction
capability enables robots to initiate turn-taking actions at an early stage,
which facilitates collaboration and increases overall efficiency.Comment: Submitted to IEEE International Conference on Robotics and Automation
(ICRA) 201
- ā¦