Formatting Time-Aligned ASR Transcripts for Readability, in

Maria Shugrina

Formatting Time-Aligned ASR Transcripts for Readability, in

Authors: Maria Shugrina
Publication date: 1 January 2010
Publisher

Abstract

Abstract We address the problem of formatting the output of an automatic speech recognition (ASR) system for readability, while preserving wordlevel timing information of the transcript. Our system enriches the ASR transcript with punctuation, capitalization and properly written dates, times and other numeric entities, and our approach can be applied to other formatting tasks. The method we describe combines hand-crafted grammars with a class-based language model trained on written text and relies on Weighted Finite State Transducers (WFSTs) for the preservation of start and end time of each word

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.1040....

Last time updated on 07/12/2020