1,330 research outputs found

    Deep learning assisted sound source localization from a flying drone

    Get PDF

    Deep learning assisted time-frequency processing for speech enhancement on drones

    Get PDF
    This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than -15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering

    A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones

    Get PDF

    RĂ©duction de l'Ă©go-bruit de robots

    Get PDF
    En robotique, il est désirable d’équiper les robots du sens de l’audition afin de mieux interagir avec les utilisateurs et l’environnement. Cependant, le bruit causé par les actionneurs des robots, nommé égo-bruit, réduit considérablement la qualité des segments audios. Conséquemment, la performance des techniques de reconnaissance de la parole et de détection d’évènements sonores est limitée par la quantité de bruit que le robot produit durant ses mouvements. Le bruit généré par les robots diffère considérablement selon l’environnement, les moteurs, les matériaux utilisés et même selon l’intégrité des différentes composantes mécaniques. L’objectif du projet est de concevoir un modèle de réduction d’égo-bruit robuste utilisant plusieurs microphones et d’être capable de le calibrer rapidement sur un robot mobile. Ce mémoire présente une méthode de réduction de l’égo-bruit combinant l’apprentissage de gabarit de matrice de covariance du bruit à un algorithme de formation de faisceau de réponses à variance minimum sans distorsion. L’approche utilisée pour l’apprentissage des matrices de covariances permet d’enregistrer les caractéristiques spatiales de l’égo-bruit en moins de deux minutes pour chaque nouvel environnement. L’algorithme de faisceau permet, quant à lui, de réduire l’égo-bruit du signal bruité sans l’ajout de distorsion nonlinéaire dans le signal résultant. La méthode est implémentée sous Robot Operating System pour une utilisation simple et rapide sur différents robots. L’évaluation de cette nouvelle méthode a été effectuée sur un robot réel dans trois environnements différents : une petite salle, une grande salle et un corridor de bureau. L’augmentation du ratio signal-bruit est d’environ 10 dB et est constante entre les trois salles. La réduction du taux d’erreur des mots de la reconnaissance vocale se situe entre 30 % et 55 %. Le modèle a aussi été testé pour la détection d’évènements sonores. Une augmentation de 7 % à 20 % de la précision moyenne a été mesurée pour la détection de la musique, mais aucune augmentation significative pour la parole, les cris, les portes qui ferment et les alarmes. La méthode proposée permet une utilisation plus accessible de la reconnaissance vocale sur des robots bruyants. De plus, une analyse des principaux paramètres a permis de valider leurs impacts sur la performance du système. Les performances sont meilleures lorsque le système est calibré avec plus de bruit du robot et lorsque la longueur des segments utilisés est plus longue. La taille de la Transformée de Fourier rapide à court terme (Short-Time Fourier Transform) peut être réduite pour réduire le temps de traitement du système. Cependant, la taille de cette transformée impacte aussi la résolution des caractéristiques du signal résultant. Un compromis doit être faire entre un faible temps de traitement et la qualité du signal en sortie du système

    Audio-based Relative Positioning System for Multiple Micro Air Vehicle Systems

    Get PDF
    Employing a group of independently controlled flying micro air vehicles (MAVs) for aerial coverage missions, instead of a single flying robot, increases the robustness and efficiency of the missions. Designing a group of MAVs requires addressing new challenges, such as inter-robot collision avoidance and formation control, where individual's knowledge about the relative location of their local group members is essential. A relative positioning system for a MAV needs to satisfy severe constraints in terms of size, weight, processing power, power consumption, three-dimensional coverage and price. In this paper we present an on-board audio based system that is capable of providing individuals with relative positioning information of their neighbouring sound emitting MAVs. We propose a method based on coherence testing among signals of a small onboard microphone array to obtain relative bearing measurements, and a particle filter estimator to fuse these measurements with information about the motion of robots throughout time to obtain the desired relative location estimates. A method based on fractional Fourier transform (FrFT) is used to identify and extract sounds of simultaneous chirping robots in the neighbourhood. Furthermore, we evaluate our proposed method in a real world experiment with three simultaneously flying micro air vehicles

    Electrophysiologic assessment of (central) auditory processing disorder in children with non-syndromic cleft lip and/or palate

    Get PDF
    Session 5aPP - Psychological and Physiological Acoustics: Auditory Function, Mechanisms, and Models (Poster Session)Cleft of the lip and/or palate is a common congenital craniofacial malformation worldwide, particularly non-syndromic cleft lip and/or palate (NSCL/P). Though middle ear deficits in this population have been universally noted in numerous studies, other auditory problems including inner ear deficits or cortical dysfunction are rarely reported. A higher prevalence of educational problems has been noted in children with NSCL/P compared to craniofacially normal children. These high level cognitive difficulties cannot be entirely attributed to peripheral hearing loss. Recently it has been suggested that children with NSCLP may be more prone to abnormalities in the auditory cortex. The aim of the present study was to investigate whether school age children with (NSCL/P) have a higher prevalence of indications of (central) auditory processing disorder [(C)APD] compared to normal age matched controls when assessed using auditory event-related potential (ERP) techniques. School children (6 to 15 years) with NSCL/P and normal controls with matched age and gender were recruited. Auditory ERP recordings included auditory brainstem response and late event-related potentials, including the P1-N1-P2 complex and P300 waveforms. Initial findings from the present study are presented and their implications for further research in this area —and clinical intervention—are outlined. © 2012 Acoustical Society of Americapublished_or_final_versio

    DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization

    Get PDF
    International audienceThis paper introduces DREGON, a novel publicly-available dataset that aims at pushing research in sound source localization using a microphone array embedded in an unmanned aerial vehicle (UAV). The dataset contains both clean and noisy in-flight audio recordings continuously annotated with the 3D position of the target sound source using an accurate motion capture system. In addition, various signals of interests are available such as the rotational speed of individual rotors and inertial measurements at all time. Besides introducing the dataset, this paper sheds light on the specific properties, challenges and opportunities brought by the emerging task of UAV-embedded sound source localization. Several baseline methods are evaluated and compared on the dataset, with real-time applicability in mind. Very promising results are obtained for the localization of a broad-band source in loud noise conditions, while speech localization remains a challenge under extreme noise levels
    • …
    corecore