Binaural sound source localization using machine learning with spiking neural networks features extraction

Abstract

Human and animal binaural hearing systems are able take advantage of a variety of cues to localise sound-sources in a 3D space using only two sensors. This work presents a bionic system that utilises aspects of binaural hearing in an automated source localisation task. A head and torso emulator (KEMAR) are used to acquire binaural signals and a spiking neural network is used to compare signals from the two sensors. The firing rates of coincidence-neurons in the spiking neural network model provide information as to the location of a sound source. Previous methods have used a winner-takesall approach, where the location of the coincidence-neuron with the maximum firing rate is used to indicate the likely azimuth and elevation. This was shown to be accurate for single sources, but when multiple sources are present the accuracy significantly reduces. To improve the robustness of the methodology, an alternative approach is developed where the spiking neural network is used as a feature pre-processor. The firing rates of all coincidence-neurons are then used as inputs to a Machine Learning model which is trained to predict source location for both single and multiple sources. A novel approach that applied spiking neural networks as a binaural feature extraction method was presented. These features were processed using deep neural networks to localise multisource sound signals that were emitted from different locations. Results show that the proposed bionic binaural emulator can accurately localise sources including multiple and complex sources to 99% correctly predicted angles from single-source localization model and 91% from multi-source localization model. The impact of background noise on localisation performance has also been investigated and shows significant degradation of performance. The multisource localization model was trained with multi-condition background noise at SNRs of 10dB, 0dB, and -10dB and tested at controlled SNRs. The findings demonstrate an enhancement in the model performance in compared with noise free training data

    Similar works