Search CORE

4 research outputs found

Robot Phonotaxis with Dynamic Sound-source Localization

Author: Amir A Handzel
P S Krishnaprasad
Sean B Anderson
Vinay Shah
Publication venue: 'Saber CDCH-UCV'
Publication date: 01/01/2003
Field of study

Abstract-We address two key goals pertaining to autonomous mobile robots: one, to develop fast accurate sensory capabilities -at present, the localization of sound sources -and second, the integration of such sensory modules with other robot functions, especially its motor control and navigation. A primary motivation for this work was to devise effective means to guide robotic navigation in environments with acoustic sources. We recently designed and built a biomimetic sound-source localization apparatus. In contrast to the popular use of time-of-arrival differences in free field microphone arrays, our system is based on the principles observed in nature, where directional acoustic sensing evolved to rely on diffraction about the head with only two ears. In this paper we present an integrated robot phonotaxis system which utilizes the robot's movement to resolve fronthack localization ambiguity. Our system achieves high angular localization acuity ( & Z 0 ) and it was successfully tested in localizing a single broadband source and moving towards it within a cluttered laboratory environment

CiteSeerX

ROBOTIC SOUND SOURCE LOCALIZATION AND TRACKING USING BIO-INSPIRED MINIATURE ACOUSTIC SENSORS

Author: Sawaqed Laith Sami
Publication venue
Publication date: 01/01/2013
Field of study

Sound source localization and tracking using auditory systems has been widely investigated for robotics applications due to their inherent advantages over other systems, such as vision based systems. Most existing robotic sound localization and tracking systems utilize conventional microphone arrays with different arrangements, which are inherently limited by a size constraint and are thus difficult to implement on miniature robots. To overcome the size constraint, sensors that mimic the mechanically coupled ear of fly Ormia have been previously developed. However, there has not been any attempt to study robotic sound source localization and tracking with these sensors. In this dissertation, robotic sound source localization and tracking using the miniature fly-ear-inspired sensors are studied for the first time. First, through investigation into the Cramer Rao lower bound (CRLB) and variance of the sound incident angle estimation, an enhanced understanding of the influence of the mechanical coupling on the performance of the fly-ear inspired sensor for sound localization is achieved. It is found that due to the mechanical coupling between the membranes, at its working frequency, the fly-ear inspired sensor can achieve an estimation of incident angle that is 100 time better than that of the conventional microphone pair with same signal-to-noise ratio in detection of the membrane deflection. Second, development of sound localization algorithms that can be used for robotic sound source localization and tracking using the fly-ear inspired sensors is carried out. Two methods are developed to estimate the sound incident angle based on the sensor output. One is based on model-free gradient descent method and the other is based on fuzzy logic. In the first approach, different localization schemes and different objective functions are investigated through numerical simulations, in which two-dimensional sound source localization is achieved without ambiguity. To address the slow convergence due to the iterative nature of the first approach, a novel fuzzy logic model of the fly-ear sensor is developed in the second approach for sound incident angle estimation. This model is studied in both simulations and experiments for localization of a stationary source and tracking a moving source in one dimension with a good performance. Third, nonlinear and quadratic-linear controllers are developed for control of the kinematics of a robot for sound source localization and tracking, which is implemented later in a mobile platform equipped with a microphone pair. Both homing onto a stationary source and tracking of a moving source with pre-defined paths are successfully demonstrated. Through this dissertation work, new knowledge on robotic sound source localization and tracking using fly-ear inspired sensors is created, which can serve as a basis for future study of sound source localization and tracking with miniature robots

Digital Repository at the University of Maryland

Robot Phonotaxis with Dynamic Sound-source Localization

Author: Amir A. H
P. S. Krishnaprasad
Sean B. Andersson
Vinay Shah
Publication venue
Publication date
Field of study

Abstract — We address two key goals pertaining to autonomous mobile robots: one, to develop fast accurate sensory capabilities — at present, the localization of sound sources — and second, the integration of such sensory modules with other robot functions, especially its motor control and navigation. A primary motivation for this work was to devise effective means to guide robotic navigation in environments with acoustic sources. We recently designed and built a biomimetic sound-source localization apparatus. In contrast to the popular use of time-of-arrival differences in free field microphone arrays, our system is based on the principles observed in nature, where directional acoustic sensing evolved to rely on diffraction about the head with only two ears. In this paper we present an integrated robot phonotaxis system which utilizes the robot’s movement to resolve frontback localization ambiguity. Our system achieves high angular localization acuity ( ± 2 o) and it was successfully tested in localizing a single broadband source and moving towards it within a cluttered laboratory environment. I

CiteSeerX

Human-robot interaction system based on multimodal and adaptive dialogs

Author: Alonso Martín Fernando
Publication venue
Publication date: 01/01/2014
Field of study

Mención Internacional en el título de doctorDurante los últimos años, en el área de la Interacción Humano-Robot (HRI), ha sido creciente el estudio de la interacción en la que participan usuarios no entrenados tecnológicamente con sistemas robóticos. Para esta población de usuarios potenciales, es necesario utilizar técnicas de interacción que no precisen de conocimientos previos específicos. En este sentido, al usuario no se le debe presuponer ningún tipo de habilidad tecnológica: la única habilidad interactiva que se le puede presuponer al usuario es la que le permite interaccionar con otros humanos. Las técnicas desarrolladas y expuestas en este trabajo tienen como finalidad, por un lado que el sistema/robot se exprese de modo y manera que esos usuarios puedan comprenderlo, sin necesidad de hacer un esfuerzo extra con respecto a la interacción con personas. Por otro lado, que el sistema/robot interprete lo que esos usuarios expresen sin que tengan que hacerlo de modo distinto a como lo harían para comunicarse con otra persona. En definitiva, se persigue imitar a los seres humanos en su manera de interactuar. En la presente se ha desarrollado y probado un sistema de interacción natural, que se ha denominado Robotics Dialog System (RDS). Permite una interacción entre el robot y el usuario usando los diversos canales de comunicación disponibles. El sistema completo consta de diversos módulos, que trabajando de una manera coordinada y complementaria, trata de alcanzar los objetivos de interacción natural deseados. RDS convive dentro de una arquitectura de control robótica y se comunica con el resto de sistemas que la componen, como son los sistemas de: toma de decisiones, secuenciación, comunicación, juegos, percepción sensoriales, expresión, etc. La aportación de esta tesis al avance del estado del arte, se produce a dos niveles. En un plano superior, se presenta el sistema de interacción humano-robot (RDS) mediante diálogos multimodales. En un plano inferior, en cada capítulo se describen los componentes desarrollados expresamente para el sistema RDS, realizando contribuciones al estado del arte en cada campo tratado. Previamente a cada aportación realizada, ha sido necesario integrar y/o implementar los avances acaecidos en su estado del arte hasta la fecha. La mayoría de estas contribuciones, se encuentran respaldadas mediante publicación en revistas científicas. En el primer campo en el que se trabajó, y que ha ido evolucionando durante todo el proceso de investigación, fue en el campo del Procesamiento del Lenguaje Natural. Se ha analizado y experimentado en situaciones reales, los sistemas más importantes de reconocimiento de voz (ASR); posteriormente, algunos de ellos han sido integrados en el sistema RDS, mediante un sistema que trabaja concurrentemente con varios motores de ASR, con el doble objetivo de mejorar la precisión en el reconocimiento de voz y proporcionar varios métodos de entrada de información complementarios. Continuó la investigación, adaptando la interacción a los posibles tipos de micrófonos y entornos acústicos. Se complementó el sistema con la capacidad de reconocer voz en múltiples idiomas y de identificar al usuario por su tono de voz. El siguiente campo de investigación tratado corresponde con la generación de lenguaje natural. El objetivo ha sido lograr un sistema de síntesis verbal con cierto grado de naturalidad e inteligibilidad, multilenguaje, con varios timbres de voz, y que expresase emociones. Se construyó un sistema modular capaz de integrar varios motores de síntesis de voz. Para dotar al sistema de cierta naturalidad y variabilidad expresiva, se incorporó un mecanismo de plantillas, que permite sintetizar voz con cierto grado de variabilidad léxica. La gestión del diálogo constituyo el siguiente reto. Se analizaron los paradigmas existentes, y se escogió un gestor basado en huecos de información. El gestor escogido se amplió y modificó para potenciar la capacidad de adaptarse al usuario (mediante perfiles) y tener cierto conocimiento del mundo. Conjuntamente, se desarrollo el módulo de fusión multimodal, que se encarga de abstraer la multimodalidad al gestor del diálogo, es decir, de abstraer al gestor del diálogo de los canales por los que se recibe el mensaje comunicativo. Este módulo, surge como el resultado de adaptar la teoría de actos comunicativos en la interacción entre humanos a nuestro sistema de interacción. Su función es la de empaquetar la información sensorial emitida por los módulos sensoriales de RDS (siguiendo un algoritmo de detección de actos comunicativos, desarrollado para este trabajo), y entregarlos al gestor del diálogo en cada turno del diálogo. Para potenciar la multimodalidad, se añadieron nuevos modos de entrada al sistema. El sistema de localización de usuarios, que en base al análisis de varias entradas de información, entre ellas la sonora, consigue identificar y localizar los usuarios que rodean al robot. La gestión de las emociones del robot y del usuario también forman parte de las modos de entradas del sistema, para ello, la emoción del robot se genera mediante un módulo externo de toma de decisiones, mientras que la emoción del usuario es percibida mediante el análisis de las características sonoras de su voz y de las expresiones de su rostro. Por último, otras modos de entrada incorporados han sido la lectura de etiquetas de radio frecuencia, y la lectura de texto escrito. Por otro lado, se desarrollaron nuevos modos expresivos o de salida. Entre ellos destacan la expresión de sonidos no-verbales generados en tiempo real, la capacidad de cantar, y de expresar ciertos gestos “de enganche” que ayudan a mejorar la naturalidad de la interacción: mirar al usuario, afirmaciones y negaciones con la cabeza, etc.In recent years, in the Human-Robot Interaction (HRI) area, there has been more interest in situations where users are not technologically skilled with robotic systems. For these users, it is necessary to use interactive techniques that don’t require previous specific knowledge. Any technological skill must not be assumed for them; the only one permitted is to communicate with other human users. The techniques that will be shown in this work have the goal that the robot or system displays information in a way that these users can understand it perfectly. In other words, in the same way they would do with any other human, and the robot or system understands what users are expressing. To sum up, the goal is to emulate how humans are interacting. In this thesis a natural interaction system has been developed and tested, it has been called Robotics Dialog System (RDS). It allows users and robotic communication using different channels. The system is comprised of many modules that work together co-ordinately to reach the desired natural interactivity levels. It has been designed inside a robotic control architecture and communicates with all the other systems: decision management system, sequencer, communication system, games, sensorial and movement skills, etc. This thesis contributes to the state-of-the-art in two levels. First, in a high level, it is shown a Human-Robot Interaction System (RDS) with multimodal dialogs. Second, in the lower level, in each chapter the specifically designed components for this RDS system will be described. All of them will contribute to the state-of-the-art individually to their scientific subject. Before each contribution it has been necessary to update them, either by integrating or implementing the state-ofthe- art techniques. Most of them have been checked with scientific journal papers. The first works were done in the Natural Language Processing system. Analysis and experiments have been carried out with the most important existing voice recognition systems (ASR) in daily real situations. Then, some of them have been added into the RDS system in a way that they are able to work concurrently, the goal was to enhance the voice recognition precision and enable several complementary input methods. Then, the research focus was move to adapt the interaction between several types of microphones and acoustic environments. Finally, the system was extended to be able to identify several languages and users, using for this later their voice tone. The next system to be focused was the natural language generator, whose main objectives within this thesis boundaries were to reach a certain level of intelligence and naturalness, to be multilingual, to have several voice tones and to express emotions. The system architecture was designed to be comprised of several modules and abstraction layers because several voice synthesis engines needed to be integrated. A pattern-based mechanism was also added to the system in order to give it some natural variability and to generate non-predefined sentences in a conversation. Then the Dialog Management System (DMS) was the next challenge. First of all, the existing paradigms whose behaviour is based in filling information gaps were analysed to choose the best one. Secondly, the system was modified and tailored to be adapted to users (by means of user profiling) and finally, some general knowledge was added (by using pre-defined files). At the same time the Multi-modal Module was developed. Its goal is to abstract this multi-modality from the DMS, in other words, the DMS system must use the message regardless the input channel the message used to reach it. This module was created as a result of adapting the communicative act theory in interactions between human beings to our interaction system. Its main function is to gather the information from the RDS sensorial modules (following an ad-hoc communicative act detection algorithm developed for this work) and to send them to the DMS at every step of the communicative process. New modes were integrated on the system to enhance this multi-modality such as the user location system, which allows the robot to know the position around it where the users are located by analysing a set of inputs, including sound. Other modes added to the system are the radio frequency tag reader and the written text reader. In addition, the robot and user emotion management have been added to the available inputs, and then, taken into account. To fulfil this requirement, the robot emotions are generated by an external decision-maker software module while the user emotions are captured by means of acoustic voice analysis and artificial vision techniques applied to the user face. Finally, new multi-modal expressive components, which make the interaction more natural, were developed: the capacity of generating non-textual real-time sounds, singing skills and some other gestures such as staring at the user, nodding, etc.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Carlos Balaguer Bernaldo de Quirós.- Vocal: Antonio Barrientos Cru

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo