517 research outputs found

    Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

    Full text link
    We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201

    Indoor wireless communications and applications

    Get PDF
    Chapter 3 addresses challenges in radio link and system design in indoor scenarios. Given the fact that most human activities take place in indoor environments, the need for supporting ubiquitous indoor data connectivity and location/tracking service becomes even more important than in the previous decades. Specific technical challenges addressed in this section are(i), modelling complex indoor radio channels for effective antenna deployment, (ii), potential of millimeter-wave (mm-wave) radios for supporting higher data rates, and (iii), feasible indoor localisation and tracking techniques, which are summarised in three dedicated sections of this chapter

    Least squares DOA estimation with an informed phase unwrapping and full bandwidth robustness

    Get PDF
    The weighted least-squares (WLS) direction-of-arrival estimator that minimizes an error based on interchannel phase differences is both computationally simple and flexible. However, the approach has several limitations, including an inability to cope with spatial aliasing and a sensitivity to phase wrapping. The recently proposed phase wrapping robust (PWR)-WLS estimator addresses the latter of these issues, but requires solving a nonconvex optimization problem. In this contribution, we focus on both of the described shortcomings. First, a conceptually simpler alternative to PWR is presented that performs comparably given a good initial estimate. This newly proposed method relies on an unwrapping of the phase differences vector. Secondly, it is demonstrated that all microphone pairs can be utilized at all frequencies with both estimators. When incorporating information from other frequency bins, this permits a localization above the spatial aliasing frequency of the array. Experimental results show that a considerable performance improvement is possible, particularly for arrays with a large microphone spacing

    A Joint Audio-Visual Approach to Audio Localization

    Get PDF

    Multiple source localization using spherical microphone arrays

    Get PDF
    Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimation methods for the real-world scenarios address one or more performance degrading factors such as noise, reverberation, multi-source activity or tackled problems such as source counting or reducing computational complexity. This thesis addresses various problems in MS DOA estimation for speech sources each of which focuses on one or more performance degrading factor(s). Firstly a narrowband DOA estimator is proposed utilizing high order spatial information in two computationally efficient ways. Secondly, an autonomous source counting technique is proposed which uses density-based clustering in an evolutionary framework. Thirdly, a confidence metric for validity of Single Source (SS) assumption in a Time-Frequency (TF) bin is proposed. It is based on MS assumption in a short time interval where the number and the TF bin of active sources are adaptively estimated. Finally two analytical narrowband MS DOA estimators are proposed based on MS assumption in a TF bin. The proposed methods are evaluated using simulations and real recordings. Each proposed technique outperforms comparative baseline methods and performs at least as accurately as the state-of-the-art.Open Acces

    EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

    Get PDF
    The detection of sound sources with microphone arrays can be enhanced through processing individual microphone signals prior to the delay and sum operation. One method in particular, the Phase Transform (PHAT) has demonstrated improvement in sound source location images, especially in reverberant and noisy environments. Recent work proposed a modification to the PHAT transform that allows varying degrees of spectral whitening through a single parameter, andamp;acirc;, which has shown positive improvement in target detection in simulation results. This work focuses on experimental evaluation of the modified SRP-PHAT algorithm. Performance results are computed from actual experimental setup of an 8-element perimeter array with a receiver operating characteristic (ROC) analysis for detecting sound sources. The results verified simulation results of PHAT- andamp;acirc; in improving target detection probabilities. The ROC analysis demonstrated the relationships between various target types (narrowband and broadband), room reverberation levels (high and low) and noise levels (different SNR) with respect to optimal andamp;acirc;. Results from experiment strongly agree with those of simulations on the effect of PHAT in significantly improving detection performance for narrowband and broadband signals especially at low SNR and in the presence of high levels of reverberation

    Informed Sound Source Localization for Hearing Aid Applications

    Get PDF

    Time difference of arrival estimation of sound source using cross correlation and modified maximum likelihood weighting function

    Get PDF
    The Generalized Cross Correlation (GCC) framework is one of the most widely used methods for Time Difference Of Arrival (TDOA) estimation and Sound Source Localization (SSL). TDOA estimation using cross correlation without any pre-filtering of the received signals has a large number of errors in real environments. Thus, several filters (weighting functions) have been proposed in the literature to improve the performance of TDOA estimation. These functions aim to mitigate TDOA estimation error in noisy and reverberant environments. Most of these methods consider the noise or reverberation, and as one of them increases, TDOA estimation error increases. In this paper, we propose a new weighting function. This function is a combined and modified version of Maximum Likelihood (ML) and PHAT-rho gamma functions. We named our proposed function as Modified Maximum Likelihood with Coherence (MMLC). This function has merits of both ML and PHAT-rho gamma functions and can work properly in both noisy and reverberant environments. We evaluate our proposed weighting function using real and synthesized datasets. Simulation results show that our proposed filter has better performance in terms of TDOA estimation error and anomalous estimations. (c) 2017 Sharif University of Technology. All rights reserved.info:eu-repo/semantics/publishedVersio

    Acoustic DOA estimation using space alternating sparse Bayesian learning

    Get PDF
    • …
    corecore