6 research outputs found

    Benchmarking

    Full text link

    Entropy based classifier combination for sentence segmentation

    No full text
    We describe recent extensions to our previous work, where we explored the use of individual classifiers, namely, boosting and maximum entropy models for sentence segmentation. In this paper we extend the set of classification methods with support vector machine (SVM). We propose a new dynamic entropy-based classifier combination approach to combine these classifiers, and compare it with the traditional classifier combination techniques, namely, voting, linear regression and logistic regression. Furthermore, we also investigate the combination of hidden event language models with the output of the proposed classifier combination, and the output of individual classifiers. Experimental studies conducted on the Mandarin TDT4 broadcast news database shows that the SVM classifier as an individual classifier improves over our previous best system. However, the proposed entropy-based classifier combination approach shows the best improvement in F-Measure of 1 % absolute, and the voting approach shows the best reduction in NIST error rate of 2.7 % absolute when compared to the previous best system. Index Terms — sentence segmentation, classifier combination, entropy, lexical and prosodic features, hidden event language model 1

    INTERSPEECH 2006- ICSLP The ICSI+ Multilingual Sentence Segmentation System

    No full text
    The ICSI+ multilingual sentence segmentation with results for English and Mandarin broadcast news automatic speech recognizer transcriptions represents a joint effort involving ICSI, SRI, and UT Dallas. Our approach is based on using hidden event language models for exploiting lexical information, and maximum entropy and boosting classifiers for exploiting lexical, as well as prosodic, speaker change and syntactic information. We demonstrate that the proposed methodology including pitch- and energyrelated prosodic features performs significantly better than a baseline system that uses words and simple pause features only. Furthermore, the obtained improvements are consistent across both languages, and no language-specific adaptation of the methodology is necessary. The best results were achieved by combining hidden event language models with a boosting-based classifier that to our knowledge has not previously been applied for this task. Index Terms: maximum entropy, boosting, hidden event language models, prosod
    corecore