137 research outputs found
Deep Neural Mel-Subband Beamformer for In-car Speech Separation
While current deep learning (DL)-based beamforming techniques have been
proved effective in speech separation, they are often designed to process
narrow-band (NB) frequencies independently which results in higher
computational costs and inference times, making them unsuitable for real-world
use. In this paper, we propose DL-based mel-subband spatio-temporal beamformer
to perform speech separation in a car environment with reduced computation cost
and inference time. As opposed to conventional subband (SB) approaches, our
framework uses a mel-scale based subband selection strategy which ensures a
fine-grained processing for lower frequencies where most speech formant
structure is present, and coarse-grained processing for higher frequencies. In
a recursive way, robust frame-level beamforming weights are determined for each
speaker location/zone in a car from the estimated subband speech and noise
covariance matrices. Furthermore, proposed framework also estimates and
suppresses any echoes from the loudspeaker(s) by using the echo reference
signals. We compare the performance of our proposed framework to several NB,
SB, and full-band (FB) processing techniques in terms of speech quality and
recognition metrics. Based on experimental evaluations on simulated and
real-world recordings, we find that our proposed framework achieves better
separation performance over all SB and FB approaches and achieves performance
closer to NB processing techniques while requiring lower computing cost.Comment: Submitted to ICASSP 202
Robust adaptive filtering algorithms for system identification and array signal processing in non-Gaussian environment
This dissertation proposes four new algorithms based on fractionally lower order statistics for adaptive filtering in a non-Gaussian interference environment. One is the affine projection sign algorithm (APSA) based on L₁ norm minimization, which combines the ability of decorrelating colored input and suppressing divergence when an outlier occurs. The second one is the variable-step-size normalized sign algorithm (VSS-NSA), which adjusts its step size automatically by matching the L₁ norm of the a posteriori error to that of noise. The third one adopts the same variable-step-size scheme but extends L₁ minimization to Lp minimization and the variable step-size normalized fractionally lower-order moment (VSS-NFLOM) algorithms are generalized. Instead of variable step size, the variable order is another trial to facilitate adaptive algorithms where no a priori statistics are available, which leads to the variable-order least mean pth norm (VO-LMP) algorithm, as the fourth one. These algorithms are applied to system identification for impulsive interference suppression, echo cancelation, and noise reduction. They are also applied to a phased array radar system with space-time adaptive processing (beamforming) to combat heavy-tailed non-Gaussian clutters. The proposed algorithms are tested by extensive computer simulations. The results demonstrate significant performance improvements in terms of convergence rate, steady-state error, computational simplicity, and robustness against impulsive noise and interference --Abstract, page iv
Effective Binaural Multi-Channel Processing Algorithm for Improved Environmental Presence
Binaural noise-reduction algorithms based on multi-channel Wiener filter (MWF) are promising techniques to be used in binaural assistive listening devices. The real-time implementation of the existing binaural MWF methods, however, involves challenges to increase the amount of noise reduction without imposing speech distortion, and at the same time preserving the binaural cues of both speech and noise components. Although significant efforts have been made in the literature, most developed methods so far have focused only on either the former or latter problem. This paper proposes an alternative binaural MWF algorithm that incorporates the non-stationarity of the signal components into the framework. The main objective is to design an algorithm that would be able to select the sources that are present in the environment. To achieve this, a modified speech presence probability (SPP) and a single-channel speech enhancement algorithm are utilized in the formulation. The resulting optimal filter also avoids the poor estimation of the second-order clean speech statistics, which is normally done by simple subtraction. Theoretical analysis and performance evaluation using realistic recorded data shows the advantage of the proposed method over the reference MWF solution in terms of the binaural cues preservation, as well as the noise reduction and speech distortion
Non-intrusive speech quality prediction using modulation energies and LSTM-network
Many signal processing algorithms have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. Perceptual measures, i.e., listening tests, are usually considered the most reliable way to evaluate the quality of speech processed by such algorithms but are costly and time-consuming. Consequently, speech enhancement algorithms are often evaluated using signal-based measures, which can be either intrusive or non-intrusive. As the computation of intrusive measures requires a reference signal, only non-intrusive measures can be used in applications for which the clean speech signal is not available. However, many existing non-intrusive measures correlate poorly with the perceived speech quality, particularly when applied over a wide range of algorithms or acoustic conditions. In this paper, we propose a novel non-intrusive measure of the quality of processed speech that combines modulation energy features and a recurrent neural network using long short-term memory cells. We collected a dataset of perceptually evaluated signals representing several acoustic conditions and algorithms and used this dataset to train and evaluate the proposed measure. Results show that the proposed measure yields higher correlation with perceptual speech quality than that of benchmark intrusive and non-intrusive measures when considering various categories of algorithms. Although the proposed measure is sensitive to mismatch between training and testing, results show that it is a useful approach to evaluate specific algorithms over a wide range of acoustic conditions and may, thus, become particularly useful for real-time selection of speech enhancement algorithm settings
- …