4 research outputs found
Robot Phonotaxis with Dynamic Sound-source Localization
Abstract-We address two key goals pertaining to autonomous mobile robots: one, to develop fast accurate sensory capabilities -at present, the localization of sound sources -and second, the integration of such sensory modules with other robot functions, especially its motor control and navigation. A primary motivation for this work was to devise effective means to guide robotic navigation in environments with acoustic sources. We recently designed and built a biomimetic sound-source localization apparatus. In contrast to the popular use of time-of-arrival differences in free field microphone arrays, our system is based on the principles observed in nature, where directional acoustic sensing evolved to rely on diffraction about the head with only two ears. In this paper we present an integrated robot phonotaxis system which utilizes the robot's movement to resolve fronthack localization ambiguity. Our system achieves high angular localization acuity ( & Z 0 ) and it was successfully tested in localizing a single broadband source and moving towards it within a cluttered laboratory environment
ROBOTIC SOUND SOURCE LOCALIZATION AND TRACKING USING BIO-INSPIRED MINIATURE ACOUSTIC SENSORS
Sound source localization and tracking using auditory systems has been widely investigated for robotics applications due to their inherent advantages over other systems, such as vision based systems. Most existing robotic sound localization and tracking systems utilize conventional microphone arrays with different arrangements, which are inherently limited by a size constraint and are thus difficult to implement on miniature robots. To overcome the size constraint, sensors that mimic the mechanically coupled ear of fly Ormia have been previously developed. However, there has not been any attempt to study robotic sound source localization and tracking with these sensors.
In this dissertation, robotic sound source localization and tracking using the miniature fly-ear-inspired sensors are studied for the first time. First, through investigation into the Cramer Rao lower bound (CRLB) and variance of the sound incident angle estimation, an enhanced understanding of the influence of the mechanical coupling on the performance of the fly-ear inspired sensor for sound localization is achieved. It is found that due to the mechanical coupling between the
membranes, at its working frequency, the fly-ear inspired sensor can achieve an estimation of incident angle that is 100 time better than that of the conventional microphone pair with same signal-to-noise ratio in detection of the membrane deflection. Second, development of sound localization algorithms that can be used for robotic sound source localization and tracking using the fly-ear inspired sensors is carried out. Two methods are developed to estimate the sound incident angle based on the sensor output. One is based on model-free gradient descent method and the other is based on fuzzy logic. In the first approach, different localization schemes and different objective functions are investigated through numerical simulations, in which two-dimensional sound source localization is achieved without ambiguity. To address the slow convergence due to the iterative nature of the first approach, a novel fuzzy logic model of the fly-ear sensor is developed in the second approach for sound incident angle estimation. This model is studied in both simulations and experiments for localization of a stationary source and tracking a moving source in one dimension with a good performance. Third, nonlinear and quadratic-linear controllers are developed for control of the kinematics of a robot for sound source localization and tracking, which is implemented later in a mobile platform equipped with a microphone pair. Both homing onto a stationary source and tracking of a moving source with pre-defined paths are successfully demonstrated.
Through this dissertation work, new knowledge on robotic sound source localization and tracking using fly-ear inspired sensors is created, which can serve as a basis for future study of sound source localization and tracking with miniature robots
Robot Phonotaxis with Dynamic Sound-source Localization
Abstract — We address two key goals pertaining to autonomous mobile robots: one, to develop fast accurate sensory capabilities — at present, the localization of sound sources — and second, the integration of such sensory modules with other robot functions, especially its motor control and navigation. A primary motivation for this work was to devise effective means to guide robotic navigation in environments with acoustic sources. We recently designed and built a biomimetic sound-source localization apparatus. In contrast to the popular use of time-of-arrival differences in free field microphone arrays, our system is based on the principles observed in nature, where directional acoustic sensing evolved to rely on diffraction about the head with only two ears. In this paper we present an integrated robot phonotaxis system which utilizes the robot’s movement to resolve frontback localization ambiguity. Our system achieves high angular localization acuity ( ± 2 o) and it was successfully tested in localizing a single broadband source and moving towards it within a cluttered laboratory environment. I
Human-robot interaction system based on multimodal and adaptive dialogs
Mención Internacional en el título de doctorDurante los últimos años, en el área de la Interacción Humano-Robot (HRI), ha
sido creciente el estudio de la interacción en la que participan usuarios no entrenados
tecnológicamente con sistemas robóticos. Para esta población de usuarios potenciales,
es necesario utilizar técnicas de interacción que no precisen de conocimientos previos
específicos. En este sentido, al usuario no se le debe presuponer ningún tipo de habilidad
tecnológica: la única habilidad interactiva que se le puede presuponer al usuario
es la que le permite interaccionar con otros humanos. Las técnicas desarrolladas y
expuestas en este trabajo tienen como finalidad, por un lado que el sistema/robot se
exprese de modo y manera que esos usuarios puedan comprenderlo, sin necesidad de
hacer un esfuerzo extra con respecto a la interacción con personas. Por otro lado, que
el sistema/robot interprete lo que esos usuarios expresen sin que tengan que hacerlo
de modo distinto a como lo harían para comunicarse con otra persona. En definitiva,
se persigue imitar a los seres humanos en su manera de interactuar.
En la presente se ha desarrollado y probado un sistema de interacción natural, que
se ha denominado Robotics Dialog System (RDS). Permite una interacción entre el
robot y el usuario usando los diversos canales de comunicación disponibles. El sistema
completo consta de diversos módulos, que trabajando de una manera coordinada
y complementaria, trata de alcanzar los objetivos de interacción natural deseados.
RDS convive dentro de una arquitectura de control robótica y se comunica con el
resto de sistemas que la componen, como son los sistemas de: toma de decisiones,
secuenciación, comunicación, juegos, percepción sensoriales, expresión, etc.
La aportación de esta tesis al avance del estado del arte, se produce a dos niveles.
En un plano superior, se presenta el sistema de interacción humano-robot (RDS)
mediante diálogos multimodales. En un plano inferior, en cada capítulo se describen los componentes desarrollados expresamente para el sistema RDS, realizando contribuciones
al estado del arte en cada campo tratado. Previamente a cada aportación
realizada, ha sido necesario integrar y/o implementar los avances acaecidos en su
estado del arte hasta la fecha. La mayoría de estas contribuciones, se encuentran
respaldadas mediante publicación en revistas científicas.
En el primer campo en el que se trabajó, y que ha ido evolucionando durante todo
el proceso de investigación, fue en el campo del Procesamiento del Lenguaje Natural.
Se ha analizado y experimentado en situaciones reales, los sistemas más importantes
de reconocimiento de voz (ASR); posteriormente, algunos de ellos han sido integrados
en el sistema RDS, mediante un sistema que trabaja concurrentemente con varios
motores de ASR, con el doble objetivo de mejorar la precisión en el reconocimiento
de voz y proporcionar varios métodos de entrada de información complementarios.
Continuó la investigación, adaptando la interacción a los posibles tipos de micrófonos
y entornos acústicos. Se complementó el sistema con la capacidad de reconocer voz
en múltiples idiomas y de identificar al usuario por su tono de voz.
El siguiente campo de investigación tratado corresponde con la generación de
lenguaje natural. El objetivo ha sido lograr un sistema de síntesis verbal con cierto
grado de naturalidad e inteligibilidad, multilenguaje, con varios timbres de voz, y
que expresase emociones. Se construyó un sistema modular capaz de integrar varios
motores de síntesis de voz. Para dotar al sistema de cierta naturalidad y variabilidad
expresiva, se incorporó un mecanismo de plantillas, que permite sintetizar voz con
cierto grado de variabilidad léxica.
La gestión del diálogo constituyo el siguiente reto. Se analizaron los paradigmas
existentes, y se escogió un gestor basado en huecos de información. El gestor escogido
se amplió y modificó para potenciar la capacidad de adaptarse al usuario (mediante
perfiles) y tener cierto conocimiento del mundo. Conjuntamente, se desarrollo el
módulo de fusión multimodal, que se encarga de abstraer la multimodalidad al gestor
del diálogo, es decir, de abstraer al gestor del diálogo de los canales por los que se
recibe el mensaje comunicativo. Este módulo, surge como el resultado de adaptar la
teoría de actos comunicativos en la interacción entre humanos a nuestro sistema de
interacción. Su función es la de empaquetar la información sensorial emitida por los
módulos sensoriales de RDS (siguiendo un algoritmo de detección de actos comunicativos,
desarrollado para este trabajo), y entregarlos al gestor del diálogo en cada
turno del diálogo.
Para potenciar la multimodalidad, se añadieron nuevos modos de entrada al sistema.
El sistema de localización de usuarios, que en base al análisis de varias entradas
de información, entre ellas la sonora, consigue identificar y localizar los usuarios que
rodean al robot. La gestión de las emociones del robot y del usuario también forman
parte de las modos de entradas del sistema, para ello, la emoción del robot se genera
mediante un módulo externo de toma de decisiones, mientras que la emoción del usuario es percibida mediante el análisis de las características sonoras de su voz y de
las expresiones de su rostro. Por último, otras modos de entrada incorporados han
sido la lectura de etiquetas de radio frecuencia, y la lectura de texto escrito.
Por otro lado, se desarrollaron nuevos modos expresivos o de salida. Entre ellos
destacan la expresión de sonidos no-verbales generados en tiempo real, la capacidad de
cantar, y de expresar ciertos gestos “de enganche” que ayudan a mejorar la naturalidad
de la interacción: mirar al usuario, afirmaciones y negaciones con la cabeza, etc.In recent years, in the Human-Robot Interaction (HRI) area, there has been more
interest in situations where users are not technologically skilled with robotic systems.
For these users, it is necessary to use interactive techniques that don’t require previous
specific knowledge. Any technological skill must not be assumed for them; the only
one permitted is to communicate with other human users. The techniques that will
be shown in this work have the goal that the robot or system displays information
in a way that these users can understand it perfectly. In other words, in the same
way they would do with any other human, and the robot or system understands what
users are expressing. To sum up, the goal is to emulate how humans are interacting.
In this thesis a natural interaction system has been developed and tested, it has
been called Robotics Dialog System (RDS). It allows users and robotic communication
using different channels. The system is comprised of many modules that work together
co-ordinately to reach the desired natural interactivity levels. It has been designed
inside a robotic control architecture and communicates with all the other systems:
decision management system, sequencer, communication system, games, sensorial and
movement skills, etc. This thesis contributes to the state-of-the-art in two levels. First,
in a high level, it is shown a Human-Robot Interaction System (RDS) with multimodal
dialogs. Second, in the lower level, in each chapter the specifically designed
components for this RDS system will be described. All of them will contribute to the
state-of-the-art individually to their scientific subject. Before each contribution it has
been necessary to update them, either by integrating or implementing the state-ofthe-
art techniques. Most of them have been checked with scientific journal papers.
The first works were done in the Natural Language Processing system. Analysis and
experiments have been carried out with the most important existing voice recognition systems (ASR) in daily real situations. Then, some of them have been added into
the RDS system in a way that they are able to work concurrently, the goal was
to enhance the voice recognition precision and enable several complementary input
methods. Then, the research focus was move to adapt the interaction between several
types of microphones and acoustic environments. Finally, the system was extended to
be able to identify several languages and users, using for this later their voice tone.
The next system to be focused was the natural language generator, whose main
objectives within this thesis boundaries were to reach a certain level of intelligence
and naturalness, to be multilingual, to have several voice tones and to express emotions.
The system architecture was designed to be comprised of several modules and
abstraction layers because several voice synthesis engines needed to be integrated.
A pattern-based mechanism was also added to the system in order to give it some
natural variability and to generate non-predefined sentences in a conversation.
Then the Dialog Management System (DMS) was the next challenge. First of
all, the existing paradigms whose behaviour is based in filling information gaps were
analysed to choose the best one. Secondly, the system was modified and tailored to
be adapted to users (by means of user profiling) and finally, some general knowledge
was added (by using pre-defined files). At the same time the Multi-modal Module was
developed. Its goal is to abstract this multi-modality from the DMS, in other words,
the DMS system must use the message regardless the input channel the message
used to reach it. This module was created as a result of adapting the communicative
act theory in interactions between human beings to our interaction system. Its main
function is to gather the information from the RDS sensorial modules (following an
ad-hoc communicative act detection algorithm developed for this work) and to send
them to the DMS at every step of the communicative process. New modes were
integrated on the system to enhance this multi-modality such as the user location
system, which allows the robot to know the position around it where the users are
located by analysing a set of inputs, including sound. Other modes added to the
system are the radio frequency tag reader and the written text reader. In addition,
the robot and user emotion management have been added to the available inputs, and
then, taken into account. To fulfil this requirement, the robot emotions are generated
by an external decision-maker software module while the user emotions are captured
by means of acoustic voice analysis and artificial vision techniques applied to the user
face. Finally, new multi-modal expressive components, which make the interaction
more natural, were developed: the capacity of generating non-textual real-time sounds,
singing skills and some other gestures such as staring at the user, nodding, etc.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Carlos Balaguer Bernaldo de Quirós.- Vocal: Antonio Barrientos Cru