Search CORE

8 research outputs found

The CSTR System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages

Author: Bell Peter
Klejch Ondřej
Wallington Electra
Publication venue: 'International Speech Communication Association'
Publication date: 30/08/2021
Field of study

Using MT-ComparEval

Author: Bojar Ondřej
Burchardt Aljoscha
Klejch Ondřej
Popel Martin
Sudarikov Roman
Publication venue
Publication date: 01/01/2016
Field of study

The paper showcases the MT-ComparEval tool for qualitative evaluation of machine translation (MT). MT-ComparEval is an opensource tool that has been designed in order to help MT developers by providing a graphical user interface that allows the comparison and evaluation of different MT engines/experiments and settings

Biblio at Institute of Formal and Applied Linguistics

On the Learning Dynamics of Semi-Supervised Training for ASR

Author: Bell Peter
Kershenbaum Benji
Klejch Ondřej
Wallington Electra
Publication venue: 'International Speech Communication Association'
Publication date: 30/08/2021
Field of study

Edinburgh Research Explorer

Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models

Author: Klejch Ondřej
Lam-Yee-Mui Léa-Marie
Yang Lucas Ondel
Publication venue
Publication date: 20/08/2023
Field of study

Edinburgh Research Explorer

Tool for comparison and evaluation of machine translation

Author: Klejch Ondřej
Publication venue
Publication date: 01/01/2013
Field of study

This bachelor thesis is about development of a tool for comparison and eva- luation of machine translation called MT-ComparEval. With this tool it is possi- ble to compare translations according to several criteria, such as automatic met- rics of machine translation quality computed on whole documents or single sen- tences, quality comparison of single sentence translation with highlighting confir- med, improving and worsening n-grams or summaries of the most improving and worsening n-grams for the whole document. When comparing two translations, MT-ComparEval also plots a chart with absolute differences of metrics compu- ted on single sentences and a chart with values obtained from paired bootstrap resampling

CU Digital Repository

National Repository of Grey Literature

Development of a cloud platform for automatic speech recognition

Author: Klejch Ondřej
Publication venue
Publication date: 01/01/2015
Field of study

This thesis presents a cloud platform for automatic speech recognition, CloudASR, built on top of Kaldi speech recognition toolkit. The platform sup- ports both batch and online speech recognition mode and it has an annotation interface for transcription of the submitted recordings. The key features of the platform are scalability, customizability and easy deployment. Benchmarks of the platform show that the platform achieves comparable performance with Google Speech API in terms of latency and it can achieve better accuracy on limited domains. Furthermore, the benchmarks show that the platform is able to handle more than 1000 parallel requests given enough computational resources.

CU Digital Repository

National Repository of Grey Literature

MT-ComparEval

Author: Klejch Ondřej
Popel Martin
Sudarikov Roman
Publication venue
Publication date: 01/01/2015
Field of study

MT-ComparEval is a tool for Machine Translation developers, which allows to compare and evaluate different MT systems (and their versions). MT-ComparEval includes several automatic MT evaluation metrics

Biblio at Institute of Formal and Applied Linguistics

Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models

Author: Klejch Ondřej
Lam-Yee-Mui Léa-Marie
Yang Lucas, Ondel
Publication venue: ISCA
Publication date: 20/08/2023
Field of study

This paper investigates the potential of improving a hybrid automatic speech recognition model trained on 10 hours of transcribed data with 200 hours of untranscribed data in lowresource languages. First, we compare baseline methods of cross-lingual transfer with MFCC features and features extracted with the multilingual self-supervised model XLSR-53. Subsequently, we compare two approaches that can leverage the untranscribed data: semi-supervised training with LF-MMI and continued self-supervised pre-training of XLSR-53. Our results on well-resourced English broadcast data derived from MGB show that both methods achieve 18% and 27% relative improvements compared to the baseline, respectively. On the low-resource South African Soap Opera dataset, the relative improvement with semi-supervised training is only 3% due to the inherently weak language model. However, continued pretraining achieves 8.6% relative improvement because it does not rely on any external information

HAL-CentraleSupelec

INRIA a CCSD electronic archive server