2 research outputs found

    Veröffentlichungen und Vorträge 2009 der Mitglieder der Fakultät für Informatik

    Get PDF

    DETECTING BANDLIMITED AUDIO IN BROADCAST TELEVISION SHOWS

    No full text
    For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically matched data. To selectively apply it, one must �rst be able to accurately detect which audio segments are narrowband. The present paper explores two different bandwidth classi�cation approaches: a traditional Gaussian mixture model (GMM) approach and a spline-based classi�er that categorizes audio segments based on their power spectra. We focus on shows found in the DARPA GALE Mandarin training and test sets, where the ratio of wideband to narrowband shows is very large. In this setting, the spline-based classi�er reduces the number of misclassi�ed wideband segments by up to 95 % relative to the GMM-based classi�er for the same number of misclassi�ed narrowband segments. Index Terms — Speech processing, speech recognition, pattern classi�cation, telephon
    corecore