13 research outputs found
Localisation d'une source sonore par un réseau de microphones
National audienceL'assistance à domicile d'une personne âgée, notamment la connaissance de sa position géographique en tout instant, est devenue actuellement l'une des problématiques les plus urgentes. L'exploitation de l'information audio captée par un réseau de capteurs munis de microphones constitue un axe de recherche prometteur qui pourrait contribuer à une meilleure localisation dans le cadre des maisons intelligentes. Nous introduisons, dans cet article, nos premiers travaux sur la localisation audio en présentant un système de localisation sonore par un ensemble de deux microphones qui se base sur l'estimation de la différence de temps d'arrivée (TDOA). Les résultats montrent qu'un couple de microphones peut localiser une source sonore dans un rayon de 3m et avec une précision de moins de 3 degrés
Joint Direction and Proximity Classification of Overlapping Sound Events from Binaural Audio
Sound source proximity and distance estimation are of great interest in many practical applications, since they provide significant information for acoustic scene analysis. As both tasks share complementary qualities, ensuring efficient interaction between these two is crucial for a complete picture of an aural environment. In this paper, we aim to investigate several ways of performing joint proximity and direction estimation from binaural recordings, both defined as coarse classification problems based on Deep Neural Networks (DNNs). Considering the limitations of binaural audio, we propose two methods of splitting the sphere into angular areas in order to obtain a set of directional classes. For each method we study different model types to acquire information about the direction-of-arrival (DoA). Finally, we propose various ways of combining the proximity and direction estimation problems into a joint task providing temporal information about the onsets and offsets of the appearing sources. Experiments are performed for a synthetic reverberant binaural dataset consisting of up to two overlapping sound events.acceptedVersionPeer reviewe
A comparison of sound localisation techniques using cross-correlation and spiking neural networks for mobile robotics
This paper outlines the development of a crosscorrelation
algorithm and a spiking neural network (SNN) for
sound localisation based on real sound recorded in a noisy and
dynamic environment by a mobile robot. The SNN architecture
aims to simulate the sound localisation ability of the
mammalian auditory pathways by exploiting the binaural cue
of interaural time difference (ITD). The medial superior olive
was the inspiration for the SNN architecture which required
the integration of an encoding layer which produced
biologically realistic spike trains, a model of the bushy cells
found in the cochlear nucleus and a supervised learning
algorithm. The experimental results demonstrate that
biologically inspired sound localisation achieved using a SNN
can compare favourably to the more classical technique of
cross-correlation
Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds
In this paper we address the problems of modeling the acoustic space
generated by a full-spectrum sound source and of using the learned model for
the localization and separation of multiple sources that simultaneously emit
sparse-spectrum sounds. We lay theoretical and methodological grounds in order
to introduce the binaural manifold paradigm. We perform an in-depth study of
the latent low-dimensional structure of the high-dimensional interaural
spectral data, based on a corpus recorded with a human-like audiomotor robot
head. A non-linear dimensionality reduction technique is used to show that
these data lie on a two-dimensional (2D) smooth manifold parameterized by the
motor states of the listener, or equivalently, the sound source directions. We
propose a probabilistic piecewise affine mapping model (PPAM) specifically
designed to deal with high-dimensional data exhibiting an intrinsic piecewise
linear structure. We derive a closed-form expectation-maximization (EM)
procedure for estimating the model parameters, followed by Bayes inversion for
obtaining the full posterior density function of a sound source direction. We
extend this solution to deal with missing data and redundancy in real world
spectrograms, and hence for 2D localization of natural sound sources such as
speech. We further generalize the model to the challenging case of multiple
sound sources and we propose a variational EM framework. The associated
algorithm, referred to as variational EM for source separation and localization
(VESSL) yields a Bayesian estimation of the 2D locations and time-frequency
masks of all the sources. Comparisons of the proposed approach with several
existing methods reveal that the combination of acoustic-space learning with
Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
This paper addresses the problem of localizing audio sources using binaural
measurements. We propose a supervised formulation that simultaneously localizes
multiple sources at different locations. The approach is intrinsically
efficient because, contrary to prior work, it relies neither on source
separation, nor on monaural segregation. The method starts with a training
stage that establishes a locally-linear Gaussian regression model between the
directional coordinates of all the sources and the auditory features extracted
from binaural measurements. While fixed-length wide-spectrum sounds (white
noise) are used for training to reliably estimate the model parameters, we show
that the testing (localization) can be extended to variable-length
sparse-spectrum sounds (such as speech), thus enabling a wide range of
realistic applications. Indeed, we demonstrate that the method can be used for
audio-visual fusion, namely to map speech signals onto images and hence to
spatially align the audio and visual modalities, thus enabling to discriminate
between speaking and non-speaking faces. We release a novel corpus of real-room
recordings that allow quantitative evaluation of the co-localization method in
the presence of one or two sound sources. Experiments demonstrate increased
accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure
Audio-Motor Integration for Robot Audition
International audienceIn the context of robotics, audio signal processing in the wild amounts to dealing with sounds recorded by a system that moves and whose actuators produce noise. This creates additional challenges in sound source localization, signal enhancement and recognition. But the speci-ficity of such platforms also brings interesting opportunities: can information about the robot actuators' states be meaningfully integrated in the audio processing pipeline to improve performance and efficiency? While robot audition grew to become an established field, methods that explicitly use motor-state information as a complementary modality to audio are scarcer. This chapter proposes a unified view of this endeavour, referred to as audio-motor integration. A literature review and two learning-based methods for audio-motor integration in robot audition are presented, with application to single-microphone sound source localization and ego-noise reduction on real data
Integrating sensors for robots operating on offshore oil and gas platforms
This thesis presents a solution to integrate sensors and instruments on a robot to be used instead of operators on unmanned oil and gas offshore platforms. Operators have various tasks from inspection to maintenance in the platforms. Because of high costs of having operators in offshore platforms, there has been always an ambitious to design a fully unmanned automated platform to decrease the costs and increase human safety in oil and gas industry. These days Robotics is quite mature to be utilized in different industries. There are few manufacturers that produce robots in order that robots perform some activities in industrial environment. But the Robot usage in offshore platforms has higher risks and they have not been used before as a rigid solution, because of inaccessibility to platforms at all conditions (such as bad weather). In this thesis, I have collected the operator tasks which are possible to be done by robots, provided main requirements to use the robots in oil and gas offshore platforms and found the sensors and instruments to be suitable to mount on the robot to measure, collect and analyze required data. Finally, the proper way for data processing and analysis was done in MATLAB Simulink to present the result of measurements and data collection.
The topic of this thesis was inspired from oil and gas offshore industry and robots are going to be used in one of the largest oil and gas offshore projects in North Sea (Yggdrasil) which will be started to operate from 2027. This EPC project (Engineering Procurement Construction) has been started from 2021 and currently is ongoing in detail engineering. The information regarding operators’ tasks and required specifications for sensors and instruments were provided based on this project requirements. The report of this thesis can be used in future for the sensors and their integration on robots. It was not possible to test or prototype on existing robots within master thesis schedule because of different schedule of the master thesis and this oil and gas project. Only simulation was carried on showing the results of this thesis
The Head Turning Modulation System: An Active Multimodal Paradigm for Intrinsically Motivated Exploration of Unknown Environments
Over the last 20 years, a significant part of the research in exploratory robotics partially switches from looking for the most efficient way of exploring an unknown environment to finding what could motivate a robot to autonomously explore it. Moreover, a growing literature focuses not only on the topological description of a space (dimensions, obstacles, usable paths, etc.) but rather on more semantic components, such as multimodal objects present in it. In the search of designing robots that behave autonomously by embedding life-long learning abilities, the inclusion of mechanisms of attention is of importance. Indeed, be it endogenous or exogenous, attention constitutes a form of intrinsic motivation for it can trigger motor command toward specific stimuli, thus leading to an exploration of the space. The Head Turning Modulation model presented in this paper is composed of two modules providing a robot with two different forms of intrinsic motivations leading to triggering head movements toward audiovisual sources appearing in unknown environments. First, the Dynamic Weighting module implements a motivation by the concept of Congruence, a concept defined as an adaptive form of semantic saliency specific for each explored environment. Then, the Multimodal Fusion and Inference module implements a motivation by the reduction of Uncertainty through a self-supervised online learning algorithm that can autonomously determine local consistencies. One of the novelty of the proposed model is to solely rely on semantic inputs (namely audio and visual labels the sources belong to), in opposition to the traditional analysis of the low-level characteristics of the perceived data. Another contribution is found in the way the exploration is exploited to actively learn the relationship between the visual and auditory modalities. Importantly, the robot—endowed with binocular vision, binaural audition and a rotating head—does not have access to prior information about the different environments it will explore. Consequently, it will have to learn in real-time what audiovisual objects are of “importance” in order to rotate its head toward them. Results presented in this paper have been obtained in simulated environments as well as with a real robot in realistic experimental conditions