137 research outputs found

    A Framework for Speech Enhancement with Ad Hoc Microphone Arrays

    Get PDF

    Deep Neural Mel-Subband Beamformer for In-car Speech Separation

    Full text link
    While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use. In this paper, we propose DL-based mel-subband spatio-temporal beamformer to perform speech separation in a car environment with reduced computation cost and inference time. As opposed to conventional subband (SB) approaches, our framework uses a mel-scale based subband selection strategy which ensures a fine-grained processing for lower frequencies where most speech formant structure is present, and coarse-grained processing for higher frequencies. In a recursive way, robust frame-level beamforming weights are determined for each speaker location/zone in a car from the estimated subband speech and noise covariance matrices. Furthermore, proposed framework also estimates and suppresses any echoes from the loudspeaker(s) by using the echo reference signals. We compare the performance of our proposed framework to several NB, SB, and full-band (FB) processing techniques in terms of speech quality and recognition metrics. Based on experimental evaluations on simulated and real-world recordings, we find that our proposed framework achieves better separation performance over all SB and FB approaches and achieves performance closer to NB processing techniques while requiring lower computing cost.Comment: Submitted to ICASSP 202

    雑音特性の変動を伴う多様な環境で実用可能な音声強調

    Get PDF
    筑波大学 (University of Tsukuba)201

    A Partitioned Approach to Signal Separation with Microphone Ad Hoc Arrays

    Get PDF

    Robust adaptive filtering algorithms for system identification and array signal processing in non-Gaussian environment

    Get PDF
    This dissertation proposes four new algorithms based on fractionally lower order statistics for adaptive filtering in a non-Gaussian interference environment. One is the affine projection sign algorithm (APSA) based on L₁ norm minimization, which combines the ability of decorrelating colored input and suppressing divergence when an outlier occurs. The second one is the variable-step-size normalized sign algorithm (VSS-NSA), which adjusts its step size automatically by matching the L₁ norm of the a posteriori error to that of noise. The third one adopts the same variable-step-size scheme but extends L₁ minimization to Lp minimization and the variable step-size normalized fractionally lower-order moment (VSS-NFLOM) algorithms are generalized. Instead of variable step size, the variable order is another trial to facilitate adaptive algorithms where no a priori statistics are available, which leads to the variable-order least mean pth norm (VO-LMP) algorithm, as the fourth one. These algorithms are applied to system identification for impulsive interference suppression, echo cancelation, and noise reduction. They are also applied to a phased array radar system with space-time adaptive processing (beamforming) to combat heavy-tailed non-Gaussian clutters. The proposed algorithms are tested by extensive computer simulations. The results demonstrate significant performance improvements in terms of convergence rate, steady-state error, computational simplicity, and robustness against impulsive noise and interference --Abstract, page iv

    Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise

    Get PDF

    Effective Binaural Multi-Channel Processing Algorithm for Improved Environmental Presence

    Get PDF
    Binaural noise-reduction algorithms based on multi-channel Wiener filter (MWF) are promising techniques to be used in binaural assistive listening devices. The real-time implementation of the existing binaural MWF methods, however, involves challenges to increase the amount of noise reduction without imposing speech distortion, and at the same time preserving the binaural cues of both speech and noise components. Although significant efforts have been made in the literature, most developed methods so far have focused only on either the former or latter problem. This paper proposes an alternative binaural MWF algorithm that incorporates the non-stationarity of the signal components into the framework. The main objective is to design an algorithm that would be able to select the sources that are present in the environment. To achieve this, a modified speech presence probability (SPP) and a single-channel speech enhancement algorithm are utilized in the formulation. The resulting optimal filter also avoids the poor estimation of the second-order clean speech statistics, which is normally done by simple subtraction. Theoretical analysis and performance evaluation using realistic recorded data shows the advantage of the proposed method over the reference MWF solution in terms of the binaural cues preservation, as well as the noise reduction and speech distortion

    Non-intrusive speech quality prediction using modulation energies and LSTM-network

    Get PDF
    Many signal processing algorithms have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. Perceptual measures, i.e., listening tests, are usually considered the most reliable way to evaluate the quality of speech processed by such algorithms but are costly and time-consuming. Consequently, speech enhancement algorithms are often evaluated using signal-based measures, which can be either intrusive or non-intrusive. As the computation of intrusive measures requires a reference signal, only non-intrusive measures can be used in applications for which the clean speech signal is not available. However, many existing non-intrusive measures correlate poorly with the perceived speech quality, particularly when applied over a wide range of algorithms or acoustic conditions. In this paper, we propose a novel non-intrusive measure of the quality of processed speech that combines modulation energy features and a recurrent neural network using long short-term memory cells. We collected a dataset of perceptually evaluated signals representing several acoustic conditions and algorithms and used this dataset to train and evaluate the proposed measure. Results show that the proposed measure yields higher correlation with perceptual speech quality than that of benchmark intrusive and non-intrusive measures when considering various categories of algorithms. Although the proposed measure is sensitive to mismatch between training and testing, results show that it is a useful approach to evaluate specific algorithms over a wide range of acoustic conditions and may, thus, become particularly useful for real-time selection of speech enhancement algorithm settings
    corecore