637 research outputs found

    Distributed Multichannel Speech Enhancement with Minimum Mean-square Error Short-time Spectral Amplitude, Log-spectral Amplitude, and Spectral Phase Estimation

    Get PDF
    In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise field for the surrounding environment. Based on the Signal-to-Noise Ratio (SNR) and Segmental Signal-to-Noise Ratio (SSNR) along with the Log-Likelihood Ratio (LLR) and Perceptual Evaluation of Speech Quality (PESQ) as objective metrics, the multichannel LSA estimator decreases background noise and speech distortion and increases speech quality compared to the baseline single channel STSA and LSA estimators, where the optimal multichannel spectral phase estimator serves as a significant quantity to the improvements, and demonstrates robustness due to time alignment and attenuation factor estimation. Overall, the optimal distributed microphone spectral estimators show strong results in noisy environments with application to many consumer, industrial, and military products

    Distributed Multichannel Speech Enhancement Based on Perceptually-Motivated Bayesian Estimators of the Spectral Amplitude

    Get PDF
    In this study, the authors propose multichannel weighted Euclidean (WE) and weighted cosh (WCOSH) cost function estimators for speech enhancement in the distributed microphone scenario. The goal of the work is to illustrate the advantages of utilising additional microphones and modified cost functions for improving signal-to-noise ratio (SNR) and segmental SNR (SSNR) along with log-likelihood ratio (LLR) and perceptual evaluation of speech quality (PESQ) objective metrics over the corresponding single-channel baseline estimators. As with their single-channel counterparts, the perceptually-motivated multichannel WE and WCOSH estimators are functions of a weighting law parameter, which influences attention of the noisy spectral amplitude through a spectral gain function, emphasises spectral peak (formant) information, and accounts for auditory masking effects. Based on the simulation results, the multichannel WE and WCOSH cost function estimators produced gains in SSNR improvement, LLR output and PESQ output over the single-channel baseline results and unweighted cost functions with the best improvements occurring with negative values of the weighting law parameter across all input SNR levels and noise types

    Optimal Distributed Microphone Phase Estimation

    Get PDF
    This paper presents a minimum mean-square error spectral phase estimator for speech enhancement in the distributed multiple microphone scenario. The estimator uses Gaussian models for both the speech and noise priors under the assumption of a diffuse incoherent noise field representing ambient noise in a widely dispersed microphone configuration. Experiments demonstrate significant benefits of using the optimal multichannel phase estimator as compared to the noisy phase of a reference channel

    Local Relative Transfer Function for Sound Source Localization

    Get PDF
    International audienceThe relative transfer function (RTF), i.e. the ratio of acoustic transfer functions between two sensors, can be used for sound source localization / beamforming based on a microphone array. The RTF is usually defined with respect to a unique reference sensor. Choosing the reference sensor may be a difficult task, especially for dynamic acoustic environment and setup. In this paper we propose to use a locally normalized RTF, in short local-RTF, as an acoustic feature to characterize the source direction. Local-RTF takes a neighbor sensor as the reference channel for a given sensor. The estimated local-RTF vector can thus avoid the bad effects of a noisy unique reference and have smaller estimation error than conventional RTF estimators. We propose two estimators for the local-RTF and concatenate the values across sensors and frequencies to form a high-dimensional vector which is utilized for source localization. Experiments with real-world signals show the interest of this approach

    Speech Separation Using Partially Asynchronous Microphone Arrays Without Resampling

    Full text link
    We consider the problem of separating speech sources captured by multiple spatially separated devices, each of which has multiple microphones and samples its signals at a slightly different rate. Most asynchronous array processing methods rely on sample rate offset estimation and resampling, but these offsets can be difficult to estimate if the sources or microphones are moving. We propose a source separation method that does not require offset estimation or signal resampling. Instead, we divide the distributed array into several synchronous subarrays. All arrays are used jointly to estimate the time-varying signal statistics, and those statistics are used to design separate time-varying spatial filters in each array. We demonstrate the method for speech mixtures recorded on both stationary and moving microphone arrays.Comment: To appear at the International Workshop on Acoustic Signal Enhancement (IWAENC 2018

    Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation

    Get PDF

    A robust sequential hypothesis testing method for brake squeal localisation

    Get PDF
    This contribution deals with the in situ detection and localisation of brake squeal in an automobile. As brake squeal is emitted from regions known a priori, i.e., near the wheels, the localisation is treated as a hypothesis testing problem. Distributed microphone arrays, situated under the automobile, are used to capture the directional properties of the sound field generated by a squealing brake. The spatial characteristics of the sampled sound field is then used to formulate the hypothesis tests. However, in contrast to standard hypothesis testing approaches of this kind, the propagation environment is complex and time-varying. Coupled with inaccuracies in the knowledge of the sensor and source positions as well as sensor gain mismatches, modelling the sound field is difficult and standard approaches fail in this case. A previously proposed approach implicitly tried to account for such incomplete system knowledge and was based on ad hoc likelihood formulations. The current paper builds upon this approach and proposes a second approach, based on more solid theoretical foundations, that can systematically account for the model uncertainties. Results from tests in a real setting show that the proposed approach is more consistent than the prior state-of-the-art. In both approaches, the tasks of detection and localisation are decoupled for complexity reasons. The localisation (hypothesis testing) is subject to a prior detection of brake squeal and identification of the squeal frequencies. The approaches used for the detection and identification of squeal frequencies are also presented. The paper, further, briefly addresses some practical issues related to array design and placement. (C) 2019 Author(s)

    A Study into Speech Enhancement Techniques in Adverse Environment

    Get PDF
    This dissertation developed speech enhancement techniques that improve the speech quality in applications such as mobile communications, teleconferencing and smart loudspeakers. For these applications it is necessary to suppress noise and reverberation. Thus the contribution in this dissertation is twofold: single channel speech enhancement system which exploits the temporal and spectral diversity of the received microphone signal for noise suppression and multi-channel speech enhancement method with the ability to employ spatial diversity to reduce reverberation
    corecore