Search CORE

2 research outputs found

Veröffentlichungen und Vorträge 2009 der Mitglieder der Fakultät für Informatik

Author: Karlsruher Institut für Technologie
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2012
Field of study

DETECTING BANDLIMITED AUDIO IN BROADCAST TELEVISION SHOWS

Author: Mark C. Fuhs
Qin Jin
Tanja Schultz
Publication venue
Publication date: 26/12/2009
Field of study

For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically matched data. To selectively apply it, one must �rst be able to accurately detect which audio segments are narrowband. The present paper explores two different bandwidth classi�cation approaches: a traditional Gaussian mixture model (GMM) approach and a spline-based classi�er that categorizes audio segments based on their power spectra. We focus on shows found in the DARPA GALE Mandarin training and test sets, where the ratio of wideband to narrowband shows is very large. In this setting, the spline-based classi�er reduces the number of misclassi�ed wideband segments by up to 95 % relative to the GMM-based classi�er for the same number of misclassi�ed narrowband segments. Index Terms — Speech processing, speech recognition, pattern classi�cation, telephon

CiteSeerX

Crossref