Search CORE

16,107 research outputs found

GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames

Author: G. Saon
H. Sakoe
J. Lockman
L.A. Schwarz
M. Arantes
P.Y. Shih
T. Yamada
Publication venue
Publication date: 01/01/2013
Field of study

In recent years videogame companies have recognized the role of player engagement as a major factor in user experience and enjoyment. This encouraged a greater investment in new types of game controllers such as the WiiMote, Rock Band instruments and the Kinect. However, the native software of these controllers was not originally designed to be used in other game applications. This work addresses this issue by building a middleware framework, which maps body poses or voice commands to actions in any game. This not only warrants a more natural and customized user-experience but it also defines an interoperable virtual controller. In this version of the framework, body poses and voice commands are respectively recognized through the Kinect's built-in cameras and microphones. The acquired data is then translated into the native interaction scheme in real time using a lightweight method based on spatial restrictions. The system is also prepared to use Nintendo's Wiimote as an auxiliary and unobtrusive gamepad for physically or verbally impractical commands. System validation was performed by analyzing the performance of certain tasks and examining user reports. Both confirmed this approach as a practical and alluring alternative to the game's native interaction scheme. In sum, this framework provides a game-controlling tool that is totally customizable and very flexible, thus expanding the market of game consumers.Comment: WorldCIST'13 Internacional Conferenc

arXiv.org e-Print Archive

Crossref

Conceptual spatial representations for indoor mobile robots

Author: Asher
Brown
Cohn
Ekvall
G.-J.M. Kruijff
H. Zender
Haasch
Hirtle
Ishiguro
Krieg-Brückner
Kuipers
Latombe
Lowe
McNamara
Moravec
O. Martínez Mozos
P. Jensfelt
Rosch
Siegwart
Stevens
Traum
W. Burgard
Zender
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

We present an approach for creating conceptual representations of human-made indoor environments using mobile robots. The concepts refer to spatial and functional properties of typical indoor environments. Following ﬁndings in cognitive psychology, our model is composed of layers representing maps at diﬀerent levels of abstraction. The complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition. The system also incorporates a linguistic framework that actively supports the map acquisition process, and which is used for situated dialogue. Finally, we discuss the capabilities of the integrated system

University of Lincoln Institutional Repository

CiteSeerX

Crossref

Modelling Users, Intentions, and Structure in Spoken Dialog

Author: Goerz Guenther
Ludwig Bernd
Niemann Heinrich
Publication venue
Publication date: 01/01/1998
Field of study

We outline how utterances in dialogs can be interpreted using a partial first order logic. We exploit the capability of this logic to talk about the truth status of formulae to define a notion of coherence between utterances and explain how this coherence relation can serve for the construction of AND/OR trees that represent the segmentation of the dialog. In a BDI model we formalize basic assumptions about dialog and cooperative behaviour of participants. These assumptions provide a basis for inferring speech acts from coherence relations between utterances and attitudes of dialog participants. Speech acts prove to be useful for determining dialog segments defined on the notion of completing expectations of dialog participants. Finally, we sketch how explicit segmentation signalled by cue phrases and performatives is covered by our dialog model.Comment: 17 page

arXiv.org e-Print Archive

CiteSeerX

University of Regensburg Publication Server

Training an adaptive dialogue policy for interactive learning of visually grounded word meanings

Author: Eshghi Arash
Lemon Oliver
Yu Yanchao
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 15/09/2016
Field of study

We present a multi-modal dialogue system for interactive learning of perceptually grounded word meanings from a human tutor. The system integrates an incremental, semantic parsing/generation framework - Dynamic Syntax and Type Theory with Records (DS-TTR) - with a set of visual classifiers that are learned throughout the interaction and which ground the meaning representations that it produces. We use this system in interaction with a simulated human tutor to study the effects of different dialogue policies and capabilities on the accuracy of learned meanings, learning rates, and efforts/costs to the tutor. We show that the overall performance of the learning agent is affected by (1) who takes initiative in the dialogues; (2) the ability to express/use their confidence level about visual attributes; and (3) the ability to process elliptical and incrementally constructed dialogue turns. Ultimately, we train an adaptive dialogue policy which optimises the trade-off between classifier accuracy and tutoring costs.Comment: 11 pages, SIGDIAL 2016 Conferenc

arXiv.org e-Print Archive

Heriot Watt Pure

Symbol Emergence in Robotics: A Survey

Author: Asoh Hideki
Iwahashi Naoto
Nagai Takayuki
Nakamura Tomoaki
Ogata Tetsuya
Taniguchi Tadahiro
Publication venue
Publication date: 29/09/2015
Field of study

Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

arXiv.org e-Print Archive

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

Author: Barbu Andrei
Berzak Yevgeni
Harari Daniel
Katz Boris
Ullman Shimon
Publication venue
Publication date: 04/12/2015
Field of study

Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.Comment: EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Multimodal agent interfaces and system architectures for health and fitness companions

Author: Cavazza Marc
Charlton Daniel
Gambäck Björn
Hakulinen Jaakko
Hansen Preben
Rodríguez Gancedo Mari C.
Santos de la Cámara Raul
Smith Cameron
Ståhl Olov
Turunen Markku
Publication venue
Publication date: 01/01/2008
Field of study

Multimodal conversational spoken dialogues using physical and virtual agents provide a potential interface to motivate and support users in the domain of health and fitness. In this paper we present how such multimodal conversational Companions can be implemented to support their owners in various pervasive and mobile settings. In particular, we focus on different forms of multimodality and system architectures for such interfaces

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive