97 research outputs found

    Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes

    Get PDF
    This paper is about alerting acoustic event detection and sound source localisation in an urban scenario. Specifically, we are interested in spotting the presence of horns, and sirens of emergency vehicles. In order to obtain a reliable system able to operate robustly despite the presence of traffic noise, which can be copious, unstructured and unpredictable, we propose to treat the spectrograms of incoming stereo signals as images, and apply semantic segmentation, based on a Unet architecture, to extract the target sound from the background noise. In a multi-task learning scheme, together with signal denoising, we perform acoustic event classification to identify the nature of the alerting sound. Lastly, we use the denoised signals to localise the acoustic source on the horizon plane, by regressing the direction of arrival of the sound through a CNN architecture. Our experimental evaluation shows an average classification rate of 94%, and a median absolute error on the localisation of 7.5{\deg} when operating on audio frames of 0.5s, and of 2.5{\deg} when operating on frames of 2.5s. The system offers excellent performance in particularly challenging scenarios, where the noise level is remarkably high.Comment: 6 pages, 9 figure

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    Audio source separation into the wild

    Get PDF
    International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

    Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments

    Get PDF
    Understanding speech in the presence of distracting talkers is a difficult computational problem known as the cocktail party problem. Motivated by auditory processing in the human brain, this thesis developed a neural network to isolate the speech of a single talker given binaural input containing a target talker and multiple distractors. In this research the network is called a Binaural Speaker Isolation FFTNet or BSINet for short. To compare the performance of BSINet to human participant performance on recognizing the target talker's speech with a varying number of distractors, a "cocktail party" dataset was designed and made available online. This dataset also enables the comparison of network performance to human participant performance. Using the Word-Error-Rate metric for evaluation, this research finds that BSINet performs comparably to the human participants. Thus BSINet provides significant advancement for solving the challenging cocktail party problem.The research was funded by an NSERC Canada Discovery Grant, a Government of Alberta Centre for Autonomous Systems in Strengthening Future Communities grant, a MITACS Globalink Award, a NSERC CGS-M Award, and a AITF Graduate Student Scholarship

    Machine Learning in Robotic Navigation:Deep Visual Localization and Adaptive Control

    Get PDF
    The work conducted in this thesis contributes to the robotic navigation field by focusing on different machine learning solutions: supervised learning with (deep) neural networks, unsupervised learning, and reinforcement learning.First, we propose a semi-supervised machine learning approach that can dynamically update the robot controller's parameters using situational analysis through feature extraction and unsupervised clustering. The results show that the robot can adapt to the changes in its surroundings, resulting in a thirty percent improvement in navigation speed and stability.Then, we train multiple deep neural networks for estimating the robot's position in the environment using ground truth information provided by a classical localization and mapping approach. We prepare two image-based localization datasets in 3D simulation and compare the results of a traditional multilayer perceptron, a stacked denoising autoencoder, and a convolutional neural network (CNN). The experiment results show that our proposed inception based CNNs without pooling layers perform very well in all the environments. Finally, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. The multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep CNNs. The results show a significant improvement when multi-goal reinforcement learning is used

    Système d'audition artificielle embarqué optimisé pour robot mobile muni d'une matrice de microphones

    Get PDF
    Dans un environnement non contrôlé, un robot doit pouvoir interagir avec les personnes d’une façon autonome. Cette autonomie doit également inclure une interaction grâce à la voix humaine. Lorsque l’interaction s’effectue à une distance de quelques mètres, des phénomènes tels que la réverbération et la présence de bruit ambiant doivent être pris en considération pour effectuer efficacement des tâches comme la reconnaissance de la parole ou de locuteur. En ce sens, le robot doit être en mesure de localiser, suivre et séparer les sources sonores présentes dans son environnement. L’augmentation récente de la puissance de calcul des processeurs et la diminution de leur consommation énergétique permettent dorénavant d’intégrer ces systèmes d’audition articielle sur des systèmes embarqués en temps réel. L’audition robotique est un domaine relativement jeune qui compte deux principales librairies d’audition artificielle : ManyEars et HARK. Jusqu’à présent, le nombre de microphones se limite généralement à huit, en raison de l’augmentation rapide de charge de calculs lorsque des microphones supplémentaires sont ajoutés. De plus, il est parfois difficile d’utiliser ces librairies avec des robots possédant des géométries variées puisqu’il est nécessaire de les calibrer manuellement. Cette thèse présente la librairie ODAS qui apporte des solutions à ces difficultés. Afin d’effectuer une localisation et une séparation plus robuste aux matrices de microphones fermées, ODAS introduit un modèle de directivité pour chaque microphone. Une recherche hiérarchique dans l’espace permet également de réduire la quantité de calculs nécessaires. De plus, une mesure de l’incertitude du délai d’arrivée du son est introduite pour ajuster automatiquement plusieurs paramètres et ainsi éviter une calibration manuelle du système. ODAS propose également un nouveau module de suivi de sources sonores qui emploie des filtres de Kalman plutôt que des filtres particulaires. Les résultats démontrent que les méthodes proposées réduisent la quantité de fausses détections durant la localisation, améliorent la robustesse du suivi pour des sources sonores multiples et augmentent la qualité de la séparation de 2.7 dB dans le cas d’un formateur de faisceau à variance minimale. La quantité de calculs requis diminue par un facteur allant jusqu’à 4 pour la localisation et jusqu’à 30 pour le suivi par rapport à la librairie ManyEars. Le module de séparation des sources sonores exploite plus efficacement la géométrie de la matrice de microphones, sans qu’il soit nécessaire de mesurer et calibrer manuellement le système. Avec les performances observées, la librairie ODAS ouvre aussi la porte à des applications dans le domaine de la détection des drones par le bruit, la localisation de bruits extérieurs pour une navigation plus efficace pour les véhicules autonomes, des assistants main-libre à domicile et l’intégration dans des aides auditives

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available

    RAPP: A Robotic-Oriented Ecosystem for Delivering Smart User Empowering Applications for Older People

    Get PDF
    International audienceIt is a general truth that increase of age is associated with a level of mental and physical decline but unfortunately the former are often accompanied by social exclusion leading to marginalization and eventually further acceleration of the aging process. A new approach in alleviating the social exclusion of older people involves the use of assistive robots. As robots rapidly invade everyday life, the need of new software paradigms in order to address the user's unique needs becomes critical. In this paper we present a novel architectural design, the RAPP [a software platform to deliver smart, user empowering robotic applications (RApps)] framework that attempts to address this issue. The proposed framework has been designed in a cloud-based approach, integrating robotic devices and their respective applications. We aim to facilitate seamless development of RApps compatible with a wide range of supported robots and available to the public through a unified online store
    • …
    corecore