6 research outputs found
WiSeBE: Window-based Sentence Boundary Evaluation
Sentence Boundary Detection (SBD) has been a major research topic since
Automatic Speech Recognition transcripts have been used for further Natural
Language Processing tasks like Part of Speech Tagging, Question Answering or
Automatic Summarization. But what about evaluation? Do standard evaluation
metrics like precision, recall, F-score or classification error; and more
important, evaluating an automatic system against a unique reference is enough
to conclude how well a SBD system is performing given the final application of
the transcript? In this paper we propose Window-based Sentence Boundary
Evaluation (WiSeBE), a semi-supervised metric for evaluating Sentence Boundary
Detection systems based on multi-reference (dis)agreement. We evaluate and
compare the performance of different SBD systems over a set of Youtube
transcripts using WiSeBE and standard metrics. This double evaluation gives an
understanding of how WiSeBE is a more reliable metric for the SBD task.Comment: In proceedings of the 17th Mexican International Conference on
Artificial Intelligence (MICAI), 201
Comparative Study on Sentence Boundary Prediction for German and English Broadcast News
We present a comparative study on sentence boundary prediction for German and English broadcast news that explores generalization across different languages. In the feature extraction stage, word pause duration is firstly extracted from word aligned speech, and forward and backward language models are utilized to extract textual features. Then a gradient boosted machine is optimized by grid search to map these features to punctuation marks. Experimental results confirm that word pause duration is a simple yet effective feature to predict whether there is a sentence boundary after that word. We found that Bayes risk derived from pause duration distributions of sentence boundary words and non-boundary words is an effective measure to assess the inherent difficulty of sentence boundary prediction. The proposed method achieved F-measures of over 90% on reference text and around 90% on ASR transcript for both German broadcast news corpus and English multi-genre broadcast news corpus. This demonstrates the state of the art performance of the proposed method