在語音增強系統中壓抑音樂型殘留雜訊之研究
- Publication date
- 2010
- Publisher
Abstract
[[abstract]]本研究計畫的目標是要在語音增強處理系統中,壓抑音樂型殘留雜訊。首先,我們將會提出一個新型的語音增強處理系統,該系統可以同時使用在一個音框內的人耳聽覺遮蔽效應及考慮相鄰音框之訊雜比變化量為調適機制,作為改善受到彩色雜訊干擾的語音信號。在一個語音增強處理系統中,藉由透過精確的估算一個音框的訊雜比的值來調適一個語音增強系統,可以降低降低部分音樂型殘留雜訊的影響,所以我們將會利用一個均化因子來改善訊雜比的估計值,再將該訊雜比值代入以人耳聽覺特性調適的語音增強處理系統中,提高音樂型殘留雜訊的壓抑效果。我們也將嘗試在小波領域中,使用同質音框合併法則來求取信號能量的估計值,並將此估計值代入最小均方誤差估測器,使得音樂型殘留雜訊的效應可以降低。由於在受干擾的環境中,信號變異數的估計值對於雜訊壓抑具有很關鍵的角色,而該信號變異數可以透過以局部視窗調適為基礎之最大相似度的方式來估計得到,也可以使用最大事後估計的方式得到。在一個語音處理系統中估計信號的變異數時,局部視窗的大小也是一個很重要的因子,因此在這個研究計畫中,我們將嘗試使用調適視窗為基礎的最大相似度估測器,使該估測器可以適用到語音增強處系統中。
This project attempts to suppress the musical residual noise for a speech enhancement system. Firstly, we will propose a speech enhancement algorithm that considers both intra-frame masking properties of the human auditory system and inter-frame signal-to-noise ratio (SNR) variation to enhance a speech signal corrupted by colored noise. The effect of musical residual noise can be partially reduced by accurately estimating the a priori SNR. Therefore, an averaging factor varying with time-frequency will be exploited to improve the estimate of the a priori SNR for a perceptuion-based speech enhancement algorithm. In addition, we will also try to analyze the bin motion of a speech signal in frequency domain. In order to improve the intelligibility of an enhanced speech signal, an unvoiced/consonant portion of a speech signal should be adequately amplified. Therefore, we intend to analyze the motion vector of each frequency bin. Hence, to find an appropriate algorithm to amplify an unvoiced/consonant portion will be performed.
In this project, we will also try to utilize minimum-mean-square-error (MMSE) estimator with homogeneous-frame-merged approach to reduce the effect of musical residual noise. The estimation of signal variance in a noisy environment is a critical issue in noise reduction. The signal variance can be effectively obtained by locally adaptive window-based maximum likelihood or the maximum a posteriori estimate. The size of locally adaptive window is also an important factor in estimating the signal variance for speech signal processing. Thus, we will modify the adaptive window-based maximum likelihood estimator to be applied in speech enhancement. Initially, a fixed-length frame defined as a reference frame will be employed to calculate the energy of noisy wavelet coefficient. Hence, merging homogeneous frames in a segment will be performed to estimate the energy of noisy wavelet coefficient. In the case of vowel speech signal, this signal is quasi-periodic. The corresponding number of homogeneous frame increases. Therefore, the gain factor is decided by the merged frames with larger size, which fact enables the gain factors to vary smoothly over the homogeneous frames. Accordingly, the effect of musical residual noise will be reduced. An arbitrary-shape frame for estimating the energy of speech wavelet coefficient is obtained. Due to the length of fixed frame is very short (a fixed frame with 3 wavelet coefficients), the frame energy can respond to a sudden change of speech signal, such as a consonant and the transient signal between consonant and vowel regions. The gain factor of wavelet coefficient is only decided by the merged frame, thevalue of a gain factor will not be affected by a frame without the same property as the reference frame. Therefore, the estimated energy of speech signal using homogeneous-frame-merged approach can well respond to that of real speech signal