5 research outputs found

    Exploring the utility of giving robots auditory perspective-taking abilities

    Get PDF
    Presented at the 12th International Conference on Auditory Display (ICAD), London, UK, June 20-23, 2006.This paper reports on work in progress to develop a computational auditory perspective taking system for a robot. Auditory perspective taking is construed as the ability to reason about inferred or posited factors that affect an addressee's perspective as a listener for the purpose of presenting auditory information in an appropriate and effective manner. High-level aspects of this aural interaction skill are discussed, and a prototype adaptive auditory display, implemented in the context of a robotic information kiosk, is described and critiqued. Additionally, a sketch of the design and goals of a user study planned for later this year is given. A demonstration of the prototype system will accompany the presentation of this research in the poster session

    Improvement of Robot Audition by Interfacing Sound Source Separation and Automatic Speech Recognition with Missing Feature Theory

    No full text
    We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different environments. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches up to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast
    corecore