Search CORE

5,684 research outputs found

Evaluation of automatic transcription systems for the judicial domain

Author: Falavigna Daniele
Giuliani Diego
Gretter Roberto
Loof Jonas
Ney Hermann
Schlüter Ralf
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

This paper describes two different automatic transcription systems developed for judicial application domains for the Polish and Italian languages. The judicial domain requires to cope with several factors which are known to be critical for automatic speech recognition, such as: background noise, reverberation, spontaneous and accented speech, overlapped speech, cross channel effects, etc. The two automatic speech recognition (ASR) systems have been developed independently starting from out-of-domain data and, then, they have been adapted to the judicial domain using a certain amount of in-domain audio and text data. The ASR performance have been measured on audio data acquired in the courtrooms of Naples and Wroclaw. The resulting word error rates are around 40%, for Italian, and around between 30% and 50% for Polish. This performance, similar to that reported for other comparable ASR tasks (e.g. meeting transcriptions with distant microphone), suggests that possible applications can address tasks such as indexing and/or information retrieval in multimedia documents recorded during judicial debates

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Publikationsserver der RWTH Aachen University

Recent Progress in Development of Language Model for Slovak Large Vocabulary Continuous Speech Recognition

Author: Daniel Hládek
Jozef Juhár
Ján Staš
Publication venue: 'IntechOpen'
Publication date: 30/03/2012
Field of study

IntechOpen

The Production of Speech Corpora

Author: Baumann Angela
Draxler Christoph
Ellbogen Tania
Schiel Florian
Steffen Alexander
Publication venue
Publication date: 21/03/2012
Field of study

Open Access LMU

Spoken Corpora Good Practice Guide 2006

Author: Baude Olivier
Blanche-Benveniste Claire
Calas Marie-France
Cappeau Paul
Cordereix Pascal
de Lamberterie Isabelle
Goury Laurence
Jacobson Michel
Marchello-Nizia Christiane
Mondada Lorenza
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceThere is currently a vast amount of fundamental or applied research, which is based on the exploitation of oral corpora (organized recorded collections of oral and multimodal language productions). Created as a result of linguists becoming aware of the importance to ensure the durability of sources and a diversified access to the oral documents they produce, this Guide to good practice mainly deals with “oral corpora”, created for and used by linguists. But the questions raised by the creation and documentary exploitation of these corpora can be found in numerous disciplines: ethnology, anthropology, sociology, psychology, demography, oral history notably use oral surveys, testimonies, interviews, life stories. Based on a linguistic approach, this Guide also touches on the preoccupations of other researchers who use oral corpora (for example in the field of speech synthesis and recognition), even if their specific needs aren’t consistently dealt with in the present document

The Indo-US Summit Partnership in Building India’s Infrastructure—A Summary of Events

Author: Deepak Kumar
P Nair
Publication venue
Publication date
Field of study

While India’s policymakers have increasingly been giving importance to the private sector, as regards the participation and investment in infrastructure, not too many American companies have availed of this opportunity. There have been many issues regarding the regulatory framework, bureaucratic delays etc. All this is set to change. The recent surveys have shown a persistent trend of increasing satisfaction among the overseas investors at the changes that are taking place in India. The situation in the infrastructure sector is not different. The summarized proceedings of the above-mentioned conference are useful in this context, and would serve to update the reader on the developments that are taking place in this important sector. The broad conclusions, as understood by the authors, are reproduced in this article.Infrastructure , India -US

Research Papers in Economics

The Indo-US Summit Partnership in Building India’s Infrastructure—A Summary of Events

Author: Deepak Kumar
P Nair
Publication venue
Publication date
Field of study

Infrastructure , India -US

Research Papers in Economics

Supervised semantic relation mining from linguistically noisy text documents

Author: Basili R
Giannone C
Moschitti A
Naggar P
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

ART

Strategic Selection of Training Data for Domain-Specific Speech Recognition

Author: Girerd Daniel
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2018
Field of study

Speech recognition is now a key topic in computer science with the proliferation of voice-activated assistants, and voice-enabled devices. Many companies over a speech recognition service for developers to use to enable smart devices and services. These speech-to-text systems, however, have significant room for improvement, especially in domain specific speech. IBM\u27s Watson speech-to-text service attempts to support domain specific uses by allowing users to upload their own training data for making custom models that augment Watson\u27s general model. This requires deciding a strategy for picking the training model. This thesis experiments with different training choices for custom language models that augment Watson\u27s speech to text service. The results show that using recent utterances is the best choice of training data in our use case of Digital Democracy. We are able to improve speech recognition accuracy by 2.3% percent over the control with no custom model. However, choosing training utterances most specific to the use case is better when large enough volumes of such training data is available

DigitalCommons@CalPoly