93 research outputs found

    Visual Speech Enhancement

    Full text link
    When video is shot in noisy environment, the voice of a speaker seen in the video can be enhanced using the visible mouth movements, reducing background noise. While most existing methods use audio-only inputs, improved performance is obtained with our visual speech enhancement, based on an audio-visual neural network. We include in the training data videos to which we added the voice of the target speaker as background noise. Since the audio input is not sufficient to separate the voice of a speaker from his own voice, the trained model better exploits the visual input and generalizes well to different noise types. The proposed model outperforms prior audio visual methods on two public lipreading datasets. It is also the first to be demonstrated on a dataset not designed for lipreading, such as the weekly addresses of Barack Obama.Comment: Accepted to Interspeech 2018. Supplementary video: https://www.youtube.com/watch?v=nyYarDGpcY

    Adversarial Network Bottleneck Features for Noise Robust Speaker Verification

    Full text link
    In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Mel-frequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition

    Assessing the effect of noise-reduction to the intelligibility of low-pass filtered speech

    Get PDF
    Given the fact that most hearing-impaired listeners have low-frequency residual hearing, the present work assessed the effect of applying commonly-used singlechannel noise-reduction (NR) algorithms to improve the intelligibility of low-pass filtered speech, which simulates the effect of understanding speech with low-frequency residual hearing of hearing-impaired patients. In addition, this study was performed with Mandarin speech, which is characterized by its significant contribution of information present in (low-frequency dominated) vowels to speech intelligibility. Mandarin sentences were corrupted by steady-state speech-shaped noise and processed by four types (i.e., subspace, statistical-modeling, spectral-subtractive, and Wiener-filtering) of single-channel NR algorithms. The processed sentences were played to normal-hearing listeners for recognition. Experimental results showed that existing single-channel NR algorithms were unable to improve the intelligibility of low-pass filtered Mandarin sentences. Wiener-filtering had the least negative influence to the intelligibility of low-pass filtered speech among the four types of single-channel NR algorithms examined

    Spectral Restoration Based Speech Enhancement for Robust Speaker Identification

    Get PDF
    Spectral restoration based speech enhancement algorithms are used to enhance quality of noise masked speech for robust speaker identification. In presence of background noise, the performance of speaker identification systems can be severely deteriorated. The present study employed and evaluated the Minimum Mean-Square-Error Short-Time Spectral Amplitude Estimators with modified a priori SNR estimate prior to speaker identification to improve performance of the speaker identification systems in presence of background noise. For speaker identification, Mel Frequency Cepstral coefficient and Vector Quantization is used to extract the speech features and to model the extracted features respectively. The experimental results showed significant improvement in speaker identification rates when spectral restoration based speech enhancement algorithms are used as a pre-processing step. The identification rates are found to be higher after employing the speech enhancement algorithms

    Reducción de ruido en la detección automática de hipernasalidad en niños

    Get PDF
    RESUMEN: En este artículo se presenta una metodología para reducir el ruido de fondo en un sistema de detección de hipernasalidad; se utilizan algunas medidas clásicas de calidad e inteligibilidad para evaluar los algoritmos, que mejoran las señales de voz, utilizados en el sistema. La detección de hipernasalidad se realiza con un clasificador lineal y se comparan los resultados obtenidos con diferentes algoritmos de sustracción espectral. Los resultados muestran que las técnicas de sustracción espectral pueden ser usadas para mejorar el rendimiento del clasificador en la detección de hipernasalidad cuando las señales se encuentran contaminadas con ruido aditivo.ABSTRACT: In this paper a methodology to reduce the background noise in a hypernasality detector system using spectral subtraction method is presented, some classical measures of quality and intelligibility are used to evaluate the speech enhancements algorithms used in the system. A linear classifier is used for the hypernasality detection and the results obtained with different spectral subtraction algorithms are compared. The results show that the spectral subtraction techniques can be used to improve the performance of the classifier in the detection of hypernasality when signals are contaminated with additive noise
    • …
    corecore