Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach

Batista, Fernando; Cabarrão, Vera; Mamede, Nuno; Mata, Ana Isabel; Matos, David; Meinedo, Hugo; Moniz, Helena; Ribeiro, Ricardo; Trancoso, Isabel

Revising the Annotation of a Broadcast News Corpus: a Linguistic Approach

Authors: Fernando Batista
Vera Cabarrão
Nuno Mamede
Ana Isabel Mata
David Matos
Hugo Meinedo
Helena Moniz
Ricardo Ribeiro
Isabel Trancoso
Publication date: 1 January 2014
Publisher: European Language Resources Association (ELRA)

Abstract

This paper presents a linguistic revision process of a speech corpus of Portuguese broadcast news focusing on metadata annotation for rich transcription, and reports on the impact of the new data on the performance for several modules. The main focus of the revision process consisted on annotating and revising structural metadata events, such as disfluencies and punctuation marks. The resultant revised data is now being extensively used, and was of extreme importance for improving the performance of several modules, especially the punctuation and capitalization modules, but also the speech recognition system, and all the subsequent modules. The resultant data has also been recently used in disfluency studies across domains.info:eu-repo/semantics/publishedVersio

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Repositório da Universidade de Lisboa

oai:repositorio.ul.pt:10451/31...

Last time updated on 12/06/2020

Universidade de Lisboa: Repositório.UL

oai:repositorio.ul.pt:10451/31...

Last time updated on 09/02/2018