4 research outputs found
Model kompanzasyonlu birinci derece istatistikleri ile i-vektörlerin gürbüzlüğünün artırılması
Speaker recognition systems achieved significant improvements over the last decade, especially due to
the performance of the i-vectors. Despite the achievements, mismatch between training and test data
affects the recognition performance considerably. In this paper, a solution is offered to increase
robustness against additive noises by inserting model compensation techniques within the i-vector
extraction scheme. For stationary noises, the model compensation techniques produce highly robust
systems. Parallel Model Compensation and Vector Taylor Series are considered as state-of-the-art
model compensation techniques. Applying these methods to the first order statistics, a noisy total
variability space training is aimed, which will reduce the mismatch resulted by additive noises. All other
parts of the conventional i-vector scheme remain unchanged, such as total variability matrix training,
reducing the i-vector dimensionality, scoring the i-vectors. The proposed method was tested with four
different noise types with several signal to noise ratios (SNR) from -6 dB to 18 dB with 6 dB steps. High
reductions in equal error rates were achieved with both methods, even at the lowest SNR levels. On
average, the proposed approach produced more than 50% relative reduction in equal error rate.Konuşmacı tanıma sistemleri özellikle i-vektörlerin performansı sebebiyle son on yılda önemli
gelişmeler elde etmiştir. Bu gelişmelere rağmen eğitim ve test verileri arasındaki uyumsuzluk tanıma
performansını önemli ölçüde etkilemektedir. Bu çalışmada, model kompanzasyon yöntemleri i-vektör
çıkarımı şemasına eklenerek toplanabilir gürültülere karşı gürbüzlüğü artıracak bir çözüm
sunulmaktadır. Durağan gürültüler için model kompanzasyon teknikleri oldukça gürbüz sistemler üretir.
Paralel Model Kompanzasyonu ve Vektör Taylor Serileri en gelişmiş model kompanzasyon
tekniklerinden kabul edilmektedir. Bu metotlar birinci dereceden istatistiklere uygulanarak toplanabilir
gürültülerden kaynaklanan uyumsuzluğu azaltacak gürültülü tüm değişkenlik uzayı eğitimi
amaçlanmıştır. Tüm değişkenlik matrisin eğitimi, i-vektör boyutunun azaltılması, i-vektörlerin
puanlanması gibi geleneksel i-vektör şemasının diğer tüm parçaları değişmeden kalmaktadır. Önerilen
yöntem, 6 dB’lik adımlarla -6 dB’den 18 dB’ye kadar çeşitli sinyal-gürültü oranlarına (SNR) sahip dört
farklı gürültü tipi ile test edilmiştir. Her iki yöntemle de en düşük SNR seviyelerinde bile eşit hata
oranlarında yüksek azalmalar elde edilmiştir. Önerilen yaklaşım eşik hata oranında ortalama olarak
%50’den fazla göreceli azalma sağlamıştır
A robust polynomial regression-based voice activity detector for speaker verification
Abstract Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-dominant regions by using a voice activity detector (VAD). In this paper, a second-order polynomial regression-based algorithm is proposed with a similar function as a VAD for text-independent speaker verification systems. The proposed method aims to separate steady noise/silence regions, steady speech regions, and speech onset/offset regions. The regression is applied independently to each filter band of a mel spectrum, which makes the algorithm fit seamlessly to the conventional extraction process of the mel-frequency cepstral coefficients (MFCCs). The k-means algorithm is also applied to estimate average noise energy in each band for spectral subtraction. A pseudo SNR-dependent linear thresholding for the final VAD output decision is introduced based on the k-means energy centers. This thresholding considers the speech presence in each band. Conventional VADs usually neglect the deteriorative effects of the additive noise in the speech regions. Contrary to this, the proposed method decides not only for the speech presence, but also if the frame is dominated by the speech, or the noise. Performance of the proposed algorithm is compared with a continuous noise tracking method, and another VAD method in speaker verification experiments, where five different noise types at five different SNR levels were considered. The proposed algorithm showed superior verification performance both with the conventional GMM-UBM method, and the state-of-the-art i-vector method
A robust polynomial regression-based voice activity detector for speaker verification
Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-dominant regions by using a voice activity detector (VAD). In this paper, a second-order polynomial regression-based algorithm is proposed with a similar function as a VAD for text-independent speaker verification systems. The proposed method aims to separate steady noise/silence regions, steady speech regions, and speech onset/offset regions. The regression is applied independently to each filter band of a mel spectrum, which makes the algorithm fit seamlessly to the conventional extraction process of the mel-frequency cepstral coefficients (MFCCs). The k-means algorithm is also applied to estimate average noise energy in each band for spectral subtraction. A pseudo SNR-dependent linear thresholding for the final VAD output decision is introduced based on the k-means energy centers. This thresholding considers the speech presence in each band. Conventional VADs usually neglect the deteriorative effects of the additive noise in the speech regions. Contrary to this, the proposed method decides not only for the speech presence, but also if the frame is dominated by the speech, or the noise. Performance of the proposed algorithm is compared with a continuous noise tracking method, and another VAD method in speaker verification experiments, where five different noise types at five different SNR levels were considered. The proposed algorithm showed superior verification performance both with the conventional GMM-UBM method, and the state-of-the-art i-vector method. © 2017, The Author(s)