Search CORE

84,802 research outputs found

Robust language recognition via adaptive language factor extraction

Author: Demuynck Kris
Desplanques Brecht
Martens Jean-Pierre
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents a technique to adapt an acoustically based language classifier to the background conditions and speaker accents. This adaptation improves language classification on a broad spectrum of TV broadcasts. The core of the system consists of an iVector-based setup in which language and channel variabilities are modeled separately. The subsequent language classifier (the backend) operates on the language factors, i.e. those features in the extracted iVectors that explain the observed language variability. The proposed technique adapts the language variability model to the background conditions and to the speaker accents present in the audio. The effect of the adaptation is evaluated on a 28 hours corpus composed of documentaries and monolingual as well as multilingual broadcast news shows. Consistent improvements in the automatic identification of Flemish (Belgian Dutch), English and French are demonstrated for all broadcast types

Ghent University Academic Bibliography

Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web

Author: Díaz de María Fernando
Gallardo Antolín Ascensión
Peláez Moreno Carmen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo