1,152 research outputs found
Kepstrum approach to real-time speech-enhancement methods using two microphones
The objective of this paper is to provide improved real-time noise canceling performance by using
kepstrum analysis. The method is applied to typically existing two-microphone approaches using
modified adaptive noise canceling and speech beamforming methods. It will be shown that the kepstrum
approach gives an improved effect for optimally enhancing a speech signal in the primary input when it
is applied to the front-end of a beamformer or speech directivity system. As a result, enhanced
performance in the form of an improved noise reduction ratio with highly reduced adaptive filter size can
be achieved. Experiments according to 20cm broadside microphone configuration are implemented in
real-time in a real environment, which is a typical indoor office with a moderate reverberation condition
A Novel Robust Mel-Energy Based Voice Activity Detector for Nonstationary Noise and Its Application for Speech Waveform Compression
The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this thesis, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and-spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple existing representative schemes, namely the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noise, namely white noise (stationary), babble noise (nonstationary) and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating curves (ROCs). Finally, we demonstrate one of the major applications, namely speech waveform compression, associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
- …