1,049 research outputs found

    Split conditional independent mapping for sound source localisation with inverse-depth parametrisation

    Full text link
    © 2016 IEEE. In this paper, we propose a framework to map stationary sound sources while simultaneously localise a moving robot. Conventional methods for localisation and sound source mapping rely on a microphone array and either, 1) a proprioceptive sensor only (such as wheel odometry) or 2) an additional exteroceptive sensor (such as cameras or lasers) to get accurately the robot locations. Since odometry drifts over time and sound observations are bearing-only, sparse and extremely noisy, the former can only deal with relatively short trajectories before the whole map drifts. In comparison, the latter can get more accurate trajectory estimation over long distances and a better estimation of the sound source map as a result. However, in most of the work in the literature, trajectory estimation and sound source mapping are treated as uncorrelated, which means an update on the robot trajectory does not propagate properly to the sound source map. In this paper, we proposed an efficient method to correlate robot trajectory with sound source mapping by exploiting the conditional independence property between two maps estimated by two different Simultaneous Localisation and Mapping (SLAM) algorithms running in parallel. In our approach, the first map has the flexibility that can be built with any SLAM algorithm (filtering or optimisation) to estimate robot poses with an exteroceptive sensor. The second map is built by using a filtering-based SLAM algorithm locating all stationary sound sources parametrised with Inverse Depth Parametrisation (IDP). Robot locations used during IDP initialisation are the common features shared between the two SLAM maps, which allow to propagate information accordingly. Comprehensive simulations and experimental results show the effectiveness of the proposed method

    Towards real-time 3D sound sources mapping with linear microphone arrays

    Full text link
    © 2017 IEEE. In this paper, we present a method for real-time 3D sound sources mapping using an off-the-shelf robotic perception sensor equipped with a linear microphone array. Conventional approaches to map sound sources in 3D scenarios use dedicated 3D microphone arrays, as this type of arrays provide two degrees of freedom (DOF) observations. Our method addresses the problem of 3D sound sources mapping using a linear microphone array, which only provides one DOF observations making the estimation of the sound sources location more challenging. In the proposed method, multi hypotheses tracking is combined with a new sound source parametrisation to provide with a good initial guess for an online optimisation strategy. A joint optimisation is carried out to estimate 6 DOF sensor poses and 3 DOF landmarks together with the sound sources locations. Additionally, a dedicated sensor model is proposed to accurately model the noise of the Direction of Arrival (DOA) observation when using a linear microphone array. Comprehensive simulation and experimental results show the effectiveness of the proposed method. In addition, a real-time implementation of our method has been made available as open source software for the benefit of the community

    Acoustic Echo Estimation using the model-based approach with Application to Spatial Map Construction in Robotics

    Get PDF

    Vision-Guided Robot Hearing

    Get PDF
    International audienceNatural human-robot interaction (HRI) in complex and unpredictable environments is important with many potential applicatons. While vision-based HRI has been thoroughly investigated, robot hearing and audio-based HRI are emerging research topics in robotics. In typical real-world scenarios, humans are at some distance from the robot and hence the sensory (microphone) data are strongly impaired by background noise, reverberations and competing auditory sources. In this context, the detection and localization of speakers plays a key role that enables several tasks, such as improving the signal-to-noise ratio for speech recognition, speaker recognition, speaker tracking, etc. In this paper we address the problem of how to detect and localize people that are both seen and heard. We introduce a hybrid deterministic/probabilistic model. The deterministic component allows us to map 3D visual data onto an 1D auditory space. The probabilistic component of the model enables the visual features to guide the grouping of the auditory features in order to form audiovisual (AV) objects. The proposed model and the associated algorithms are implemented in real-time (17 FPS) using a stereoscopic camera pair and two microphones embedded into the head of the humanoid robot NAO. We perform experiments with (i)~synthetic data, (ii)~publicly available data gathered with an audiovisual robotic head, and (iii)~data acquired using the NAO robot. The results validate the approach and are an encouragement to investigate how vision and hearing could be further combined for robust HRI

    Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

    Get PDF
    International audienceThis paper addresses the problem of sound-source localization (SSL) with a robot head, which remains a challenge in real-world environments. In particular we are interested in locating speech sources, as they are of high interest for human-robot interaction. The microphone-pair response corresponding to the direct-path sound propagation is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function (ATF) of the two microphones, and it is an important feature for SSL. We propose a method to estimate the DP-RTF from noisy and reverberant signals in the short-time Fourier transform (STFT) domain. First, the convolutive transfer function (CTF) approximation is adopted to accurately represent the impulse response of the microphone array, and the first coefficient of the CTF is mainly composed of the direct-path ATF. At each frequency, the frame-wise speech auto-and cross-power spectral density (PSD) are obtained by spectral subtraction. Then a set of linear equations is constructed by the speech auto-and cross-PSD of multiple frames, in which the DP-RTF is an unknown variable, and is estimated by solving the equations. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for SSL. Experiments with a robot, placed in various reverberant environments, show that the proposed method outperforms two state-of-the-art methods
    corecore