8,760 research outputs found

    Multimodal fusion : gesture and speech input in augmented reality environment

    Get PDF
    Augmented Reality (AR) has the capability to interact with the virtual objects and physical objects simultaneously since it combines the real world with virtual world seamlessly. However, most AR interface applies conventional Virtual Reality (VR) interaction techniques without modification. In this paper we explore the multimodal fusion for AR with speech and hand gesture input. Multimodal fusion enables users to interact with computers through various input modalities like speech, gesture, and eye gaze. At the first stage to propose the multimodal interaction, the input modalities are decided to be selected before be integrated in an interface. The paper presents several related works about to recap the multimodal approaches until it recently has been one of the research trends in AR. It presents the assorted existing works in multimodal for VR and AR. In AR, multimodal considers as the solution to improve the interaction between the virtual and physical entities. It is an ideal interaction technique for AR applications since AR supports interactions in real and virtual worlds in the real-time. This paper describes the recent studies in AR developments that appeal gesture and speech inputs. It looks into multimodal fusion and its developments, followed by the conclusion.This paper will give a guideline on multimodal fusion on how to integrate the gesture and speech inputs in AR environment

    The multimodal edge of human aerobotic interaction

    No full text
    This paper presents the idea of a multimodal human aerobotic interaction. An overview of the aerobotic system and its application is given. The joystick-based controller interface and its limitations is discussed. Two techniques are suggested as emerging alternatives to the joystick-based controller interface used in human aerobotic interaction. The first technique is a multimodal combination of speech, gaze, gesture, and other non-verbal cues already used in regular human-humaninteraction. The second is telepathic interaction via brain computer interfaces. The potential limitations of these alternatives is highlighted, and the considerations for further works are presented

    Multimodal interfaces: Challenges and perspectives

    Get PDF
    Abstract. The development of interfaces has been a technology-driven process. However, the newly developed multimodal interfaces are using recognition-based technologies that must interpret human-speech, gesture, gaze, movement patterns, and other behavioral cues. As a result, the interface design requires a human-centered approach. In this paper we review the major approaches to multimodal Human Computer Interaction, giving an overview the user and task modeling, and to the multimodal fusion. We highlight the challenges, open issues, and the future trends in multimodal interfaces research

    Multimodal Polynomial Fusion for Detecting Driver Distraction

    Full text link
    Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone. Although there has been a considerable amount of research on modeling the distracted behavior of drivers under various conditions, accurate automatic detection using multiple modalities and especially the contribution of using the speech modality to improve accuracy has received little attention. This paper introduces a new multimodal dataset for distracted driving behavior and discusses automatic distraction detection using features from three modalities: facial expression, speech and car signals. Detailed multimodal feature analysis shows that adding more modalities monotonically increases the predictive accuracy of the model. Finally, a simple and effective multimodal fusion technique using a polynomial fusion layer shows superior distraction detection results compared to the baseline SVM and neural network models.Comment: INTERSPEECH 201

    The AISB’08 Symposium on Multimodal Output Generation (MOG 2008)

    Get PDF
    Welcome to Aberdeen at the Symposium on Multimodal Output Generation (MOG 2008)! In this volume the papers presented at the MOG 2008 international symposium are collected
    corecore