2 research outputs found
DETECTING BANDLIMITED AUDIO IN BROADCAST TELEVISION SHOWS
For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically matched data. To selectively apply it, one must �rst be able to accurately detect which audio segments are narrowband. The present paper explores two different bandwidth classi�cation approaches: a traditional Gaussian mixture model (GMM) approach and a spline-based classi�er that categorizes audio segments based on their power spectra. We focus on shows found in the DARPA GALE Mandarin training and test sets, where the ratio of wideband to narrowband shows is very large. In this setting, the spline-based classi�er reduces the number of misclassi�ed wideband segments by up to 95 % relative to the GMM-based classi�er for the same number of misclassi�ed narrowband segments. Index Terms — Speech processing, speech recognition, pattern classi�cation, telephon