Search CORE

8,760 research outputs found

Multimodal fusion : gesture and speech input in augmented reality environment

Author: A. Neumann
C. Fröhlich
C.J. Lim
M. Nicholson
R. Sharma
S. Irawati
S. Irawati
S. Mitra
Publication venue
Publication date: 01/01/2014
Field of study

Augmented Reality (AR) has the capability to interact with the virtual objects and physical objects simultaneously since it combines the real world with virtual world seamlessly. However, most AR interface applies conventional Virtual Reality (VR) interaction techniques without modification. In this paper we explore the multimodal fusion for AR with speech and hand gesture input. Multimodal fusion enables users to interact with computers through various input modalities like speech, gesture, and eye gaze. At the first stage to propose the multimodal interaction, the input modalities are decided to be selected before be integrated in an interface. The paper presents several related works about to recap the multimodal approaches until it recently has been one of the research trends in AR. It presents the assorted existing works in multimodal for VR and AR. In AR, multimodal considers as the solution to improve the interaction between the virtual and physical entities. It is an ideal interaction technique for AR applications since AR supports interactions in real and virtual worlds in the real-time. This paper describes the recent studies in AR developments that appeal gesture and speech inputs. It looks into multimodal fusion and its developments, followed by the conclusion.This paper will give a guideline on multimodal fusion on how to integrate the gesture and speech inputs in AR environment

Crossref

Universiti Teknologi Malaysia Institutional Repository

The multimodal edge of human aerobotic interaction

Author: Abioye Ayodeji
Prior Stephen
Saddington Peter
Thomas Trevor
Publication venue
Publication date: 02/07/2016
Field of study

This paper presents the idea of a multimodal human aerobotic interaction. An overview of the aerobotic system and its application is given. The joystick-based controller interface and its limitations is discussed. Two techniques are suggested as emerging alternatives to the joystick-based controller interface used in human aerobotic interaction. The first technique is a multimodal combination of speech, gaze, gesture, and other non-verbal cues already used in regular human-humaninteraction. The second is telepathic interaction via brain computer interfaces. The potential limitations of these alternatives is highlighted, and the considerations for further works are presented

Southampton (e-Prints Soton)

Multimodal interfaces: Challenges and perspectives

Author: Nicu Sebe
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. The development of interfaces has been a technology-driven process. However, the newly developed multimodal interfaces are using recognition-based technologies that must interpret human-speech, gesture, gaze, movement patterns, and other behavioral cues. As a result, the interface design requires a human-centered approach. In this paper we review the major approaches to multimodal Human Computer Interaction, giving an overview the user and task modeling, and to the multimodal fusion. We highlight the challenges, open issues, and the future trends in multimodal interfaces research

CiteSeerX

Multimodal Polynomial Fusion for Detecting Driver Distraction

Author: Black Alan W
Du Yulun
Eskenazi Maxine
Morency Louis-Philippe
Raman Chirag
Publication venue: 'International Speech Communication Association'
Publication date: 24/10/2018
Field of study

Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone. Although there has been a considerable amount of research on modeling the distracted behavior of drivers under various conditions, accurate automatic detection using multiple modalities and especially the contribution of using the speech modality to improve accuracy has received little attention. This paper introduces a new multimodal dataset for distracted driving behavior and discusses automatic distraction detection using features from three modalities: facial expression, speech and car signals. Detailed multimodal feature analysis shows that adding more modalities monotonically increases the predictive accuracy of the model. Finally, a simple and effective multimodal fusion technique using a polynomial fusion layer shows superior distraction detection results compared to the baseline SVM and neural network models.Comment: INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

The AISB’08 Symposium on Multimodal Output Generation (MOG 2008)

Author: André E.
Bachvarova Y.S.
Sluis I. van der
Theune M.
Publication venue: Society for the Study of Artificial Intelligence and the Simulation of Behaviour (AISB)
Publication date: 01/01/2008
Field of study

Welcome to Aberdeen at the Symposium on Multimodal Output Generation (MOG 2008)! In this volume the papers presented at the MOG 2008 international symposium are collected

University of Twente Research Information