2 research outputs found
Evaluation of automatic transcription systems for the judicial domain
This paper describes two different automatic transcription systems
developed for judicial application domains for the Polish and Italian
languages. The judicial domain requires to cope with several factors
which are known to be critical for automatic speech recognition, such
as: background noise, reverberation, spontaneous and accented speech,
overlapped speech, cross channel effects, etc.
The two automatic speech recognition (ASR) systems have been developed
independently starting from out-of-domain data and, then, they have
been adapted to the judicial domain using a certain amount of
in-domain audio and text data.
The ASR performance have been measured on audio data acquired in the
courtrooms of Naples and Wroclaw. The resulting word error rates are
around 40%, for Italian, and around between 30% and 50% for Polish.
This performance, similar to that reported for other comparable ASR
tasks (e.g. meeting transcriptions with distant microphone), suggests
that possible applications can address tasks such as indexing and/or
information retrieval in multimedia documents recorded during judicial
debates
Phone-to-word decoding through statistical machine translation and complementary system combination
In this paper, phone-to-word transduction is first
investigated by coupling a speech recognizer, generating for each
speech segment a phone sequence or a phone confusion network,
with the efficient decoder of confusion networks adopted by
MOSES, a popular statistical machine translation toolkit. Then,
system combination is investigated by combining the outputs of
several conventional ASR systems with the output of a system
embedding phone-to-word decoding through statistical machine
translation.
Experiments are carried out in the context of a large vocabulary
speech recognition task consisting of transcription of
speeches delivered in English during the European Parliament
Plenary Sessions (EPPS). While only a marginal performance
improvements is achieved in system combination experiments
when the output of the phone-to-word transducer is included
in the combination, partial results show a great potential for
improvements