4 research outputs found

    A small Griko-Italian speech translation corpus

    Get PDF
    This paper presents an extension to a very low-resource parallel corpus collected in an endangered language, Griko, making it useful for computational research. The corpus consists of 330 utterances (about 2 hours of speech) which have been transcribed and translated in Italian, with annotations for word-level speech-to-transcription and speech-to-translation alignments. The corpus also includes morpho syntactic tags and word-level glosses. Applying an automatic unit discovery method, pseudo-phones were also generated. We detail how the corpus was collected, cleaned and processed, and we illustrate its use on zero-resource tasks by presenting some baseline results for the task of speech-to-translation alignment and unsupervised word discovery. The dataset will be available online, aiming to encourage replicability and diversity in computational language documentation experiments

    User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

    Get PDF
    This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis,a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit. The goal of this work is to makeend-to-end speech recognition models avail-able to language workers via a user-friendlygraphical interface. Encouraging results are reported on (i) development of an ESPnet recipe for use in Elpis, with preliminary resultson data sets previously used for training acoustic models with the Persephone toolkit alongwith a new data set that had not previously been used in speech recognition, and (ii) in-corporating ESPnet into Elpis along with UIe nhancements and a CUDA-supported Docker file
    corecore