Search CORE

2 research outputs found

Evaluation of automatic transcription systems for the judicial domain

Author: Falavigna Daniele
Giuliani Diego
Gretter Roberto
Loof Jonas
Ney Hermann
Schlüter Ralf
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

This paper describes two different automatic transcription systems developed for judicial application domains for the Polish and Italian languages. The judicial domain requires to cope with several factors which are known to be critical for automatic speech recognition, such as: background noise, reverberation, spontaneous and accented speech, overlapped speech, cross channel effects, etc. The two automatic speech recognition (ASR) systems have been developed independently starting from out-of-domain data and, then, they have been adapted to the judicial domain using a certain amount of in-domain audio and text data. The ASR performance have been measured on audio data acquired in the courtrooms of Naples and Wroclaw. The resulting word error rates are around 40%, for Italian, and around between 30% and 50% for Polish. This performance, similar to that reported for other comparable ASR tasks (e.g. meeting transcriptions with distant microphone), suggests that possible applications can address tasks such as indexing and/or information retrieval in multimedia documents recorded during judicial debates

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Publikationsserver der RWTH Aachen University

Phone-to-word decoding through statistical machine translation and complementary system combination

Author: Falavigna Giuseppe Daniele
Gerosa Matteo
Giuliani Diego
Gretter Roberto
Publication venue: country:USA
Publication date: 01/01/2009
Field of study

In this paper, phone-to-word transduction is first investigated by coupling a speech recognizer, generating for each speech segment a phone sequence or a phone confusion network, with the efficient decoder of confusion networks adopted by MOSES, a popular statistical machine translation toolkit. Then, system combination is investigated by combining the outputs of several conventional ASR systems with the output of a system embedding phone-to-word decoding through statistical machine translation. Experiments are carried out in the context of a large vocabulary speech recognition task consisting of transcription of speeches delivered in English during the European Parliament Plenary Sessions (EPPS). While only a marginal performance improvements is achieved in system combination experiments when the output of the phone-to-word transducer is included in the combination, partial results show a great potential for improvements

Archivio della ricerca - Fondazione Bruno Kessler