429 research outputs found
Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise
Human-robot interaction relies on a noise-robust audio processing module
capable of estimating target speech from audio recordings impacted by
environmental noise, as well as self-induced noise, so-called ego-noise. While
external ambient noise sources vary from environment to environment, ego-noise
is mainly caused by the internal motors and joints of a robot. Ego-noise and
environmental noise reduction are often decoupled, i.e., ego-noise reduction is
performed without considering environmental noise. Recently, a variational
autoencoder (VAE)-based speech model has been combined with a fully adaptive
non-negative matrix factorization (NMF) noise model to recover clean speech
under different environmental noise disturbances. However, its enhancement
performance is limited in adverse acoustic scenarios involving, e.g. ego-noise.
In this paper, we propose a multichannel partially adaptive scheme to jointly
model ego-noise and environmental noise utilizing the VAE-NMF framework, where
we take advantage of spatially and spectrally structured characteristics of
ego-noise by pre-training the ego-noise model, while retaining the ability to
adapt to unknown environmental noise. Experimental results show that our
proposed approach outperforms the methods based on a completely fixed scheme
and a fully adaptive scheme when ego-noise and environmental noise are present
simultaneously.Comment: Accepted to the 2023 IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP 2023
Réduction de l'égo-bruit de robots
En robotique, il est désirable d’équiper les robots du sens de l’audition afin de mieux interagir avec les utilisateurs et l’environnement. Cependant, le bruit causé par les actionneurs des robots, nommé égo-bruit, réduit considérablement la qualité des segments audios. Conséquemment, la performance des techniques de reconnaissance de la parole et de détection d’évènements sonores est limitée par la quantité de bruit que le robot produit durant ses mouvements. Le bruit généré par les robots diffère considérablement selon l’environnement, les moteurs, les matériaux utilisés et même selon l’intégrité des différentes composantes mécaniques. L’objectif du projet est de concevoir un modèle de réduction d’égo-bruit robuste utilisant plusieurs microphones et d’être capable de le calibrer rapidement sur un robot mobile.
Ce mémoire présente une méthode de réduction de l’égo-bruit combinant l’apprentissage de gabarit de matrice de covariance du bruit à un algorithme de formation de faisceau de réponses à variance minimum sans distorsion. L’approche utilisée pour l’apprentissage des matrices de covariances permet d’enregistrer les caractéristiques spatiales de l’égo-bruit en moins de deux minutes pour chaque nouvel environnement. L’algorithme de faisceau permet, quant à lui, de réduire l’égo-bruit du signal bruité sans l’ajout de distorsion nonlinéaire dans le signal résultant. La méthode est implémentée sous Robot Operating System pour une utilisation simple et rapide sur différents robots.
L’évaluation de cette nouvelle méthode a été effectuée sur un robot réel dans trois environnements différents : une petite salle, une grande salle et un corridor de bureau. L’augmentation du ratio signal-bruit est d’environ 10 dB et est constante entre les trois salles. La réduction du taux d’erreur des mots de la reconnaissance vocale se situe entre 30 % et 55 %. Le modèle a aussi été testé pour la détection d’évènements sonores. Une augmentation de 7 % à 20 % de la précision moyenne a été mesurée pour la détection de la musique, mais aucune augmentation significative pour la parole, les cris, les portes qui ferment et les alarmes. La méthode proposée permet une utilisation plus accessible de la reconnaissance vocale sur des robots bruyants.
De plus, une analyse des principaux paramètres a permis de valider leurs impacts sur la performance du système. Les performances sont meilleures lorsque le système est calibré avec plus de bruit du robot et lorsque la longueur des segments utilisés est plus longue. La taille de la Transformée de Fourier rapide à court terme (Short-Time Fourier Transform) peut être réduite pour réduire le temps de traitement du système. Cependant, la taille de cette transformée impacte aussi la résolution des caractéristiques du signal résultant. Un compromis doit être faire entre un faible temps de traitement et la qualité du signal en sortie du système
Audio-Motor Integration for Robot Audition
International audienceIn the context of robotics, audio signal processing in the wild amounts to dealing with sounds recorded by a system that moves and whose actuators produce noise. This creates additional challenges in sound source localization, signal enhancement and recognition. But the speci-ficity of such platforms also brings interesting opportunities: can information about the robot actuators' states be meaningfully integrated in the audio processing pipeline to improve performance and efficiency? While robot audition grew to become an established field, methods that explicitly use motor-state information as a complementary modality to audio are scarcer. This chapter proposes a unified view of this endeavour, referred to as audio-motor integration. A literature review and two learning-based methods for audio-motor integration in robot audition are presented, with application to single-microphone sound source localization and ego-noise reduction on real data
ODAS: Open embeddeD Audition System
Artificial audition aims at providing hearing capabilities to machines,
computers and robots. Existing frameworks in robot audition offer interesting
sound source localization, tracking and separation performance, but involve a
significant amount of computations that limit their use on robots with embedded
computing capabilities. This paper presents ODAS, the Open embeddeD Audition
System framework, which includes strategies to reduce the computational load
and perform robot audition tasks on low-cost embedded computing systems. It
presents key features of ODAS, along with cases illustrating its uses in
different robots and artificial audition applications
3D reconstruction and motion estimation using forward looking sonar
Autonomous Underwater Vehicles (AUVs) are increasingly used in different domains
including archaeology, oil and gas industry, coral reef monitoring, harbour’s security,
and mine countermeasure missions. As electromagnetic signals do not penetrate
underwater environment, GPS signals cannot be used for AUV navigation, and optical
cameras have very short range underwater which limits their use in most underwater
environments.
Motion estimation for AUVs is a critical requirement for successful vehicle recovery
and meaningful data collection. Classical inertial sensors, usually used for AUV motion
estimation, suffer from large drift error. On the other hand, accurate inertial sensors are
very expensive which limits their deployment to costly AUVs. Furthermore, acoustic
positioning systems (APS) used for AUV navigation require costly installation and
calibration. Moreover, they have poor performance in terms of the inferred resolution.
Underwater 3D imaging is another challenge in AUV industry as 3D information is
increasingly demanded to accomplish different AUV missions. Different systems have
been proposed for underwater 3D imaging, such as planar-array sonar and T-configured
3D sonar. While the former features good resolution in general, it is very expensive and
requires huge computational power, the later is cheaper implementation but requires
long time for full 3D scan even in short ranges.
In this thesis, we aim to tackle AUV motion estimation and underwater 3D imaging by
proposing relatively affordable methodologies and study different parameters affecting
their performance. We introduce a new motion estimation framework for AUVs which
relies on the successive acoustic images to infer AUV ego-motion. Also, we propose an
Acoustic Stereo Imaging (ASI) system for underwater 3D reconstruction based on
forward looking sonars; the proposed system features cheaper implementation than
planar array sonars and solves the delay problem in T configured 3D sonars
- …