8,928 research outputs found
Automating the creation of speech recognition systems for under-resourced languages
© 2015 IEEE.More than 7100 languages are spoken in the world and the significant part of these languages suffers frothe absence of speech services, therefore people cannot use them on their native languages and have to learn and use other languages in order to communicate with modern information technologies. This paper describes an approach to automate the creation of speech recognition systems for under-resourced languages. The aim is to simplify and speed up this process via providing the necessary tools and organizing the process of systems' development and testing. The results of building phoneme and speech recognition systems for the Tatar language (3rd most spoken language in Russia) demonstrate the possibility of using the proposed platform for under-resourced languages
UmobiTalk: Ubiquitous Mobile Speech Based Learning Language Translator for Sesotho Language
Published ThesisThe need to conserve the under-resourced languages is becoming more urgent as some of them are becoming extinct; natural language processing can be used to redress this. Currently, most initiatives around language processing technologies are focusing on western languages such as English and French, yet resources for such languages are already available. The Sesotho language is one of the under-resourced Bantu languages; it is mostly spoken in Free State province of South Africa and in Lesotho. Like other parts of South Africa, Free State has experienced high number of migrants and non-Sesotho speakers from neighboring provinces and countries; such people are faced with serious language barrier problems especially in the informal settlements where everyone tends to speak only Sesotho. Non-Sesotho speakers refers to the racial groups such as Xhosas, Zulus, Coloureds, Whites and more, in which Sesotho language is not their native language.
As a solution to this, we developed a parallel corpus that has English as source and Sesotho as a target language and packaged it in UmobiTalk - Ubiquitous mobile speech based learning translator. UmobiTalk is a mobile-based tool for learning Sesotho for English speakers. The development of this tool was based on the combination of automatic speech recognition, machine translation and speech synthesis
NLP for Language Varieties of Italy: Challenges and the Path Forward
Italy is characterized by a one-of-a-kind linguistic diversity landscape in
Europe, which implicitly encodes local knowledge, cultural traditions, artistic
expression, and history of its speakers. However, over 30 language varieties in
Italy are at risk of disappearing within few generations. Language technology
has a main role in preserving endangered languages, but it currently struggles
with such varieties as they are under-resourced and mostly lack standardized
orthography, being mainly used in spoken settings. In this paper, we introduce
the linguistic context of Italy and discuss challenges facing the development
of NLP technologies for Italy's language varieties. We provide potential
directions and advocate for a shift in the paradigm from machine-centric to
speaker-centric NLP. Finally, we propose building a local community towards
responsible, participatory development of speech and language technologies for
languages and dialects of Italy.Comment: 16 pages, 3 figures, 4 table
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Most speech and language technologies are trained with massive amounts of
speech and text information. However, most of the world languages do not have
such resources or stable orthography. Systems constructed under these almost
zero resource conditions are not only promising for speech technology but also
for computational language documentation. The goal of computational language
documentation is to help field linguists to (semi-)automatically analyze and
annotate audio recordings of endangered and unwritten languages. Example tasks
are automatic phoneme discovery or lexicon discovery from the speech signal.
This paper presents a speech corpus collected during a realistic language
documentation process. It is made up of 5k speech utterances in Mboshi (Bantu
C25) aligned to French text translations. Speech transcriptions are also made
available: they correspond to a non-standard graphemic form close to the
language phonology. We present how the data was collected, cleaned and
processed and we illustrate its use through a zero-resource task: spoken term
discovery. The dataset is made available to the community for reproducible
computational language documentation experiments and their evaluation.Comment: accepted to LREC 201
Bayesian Models for Unit Discovery on a Very Low Resource Language
Developing speech technologies for low-resource languages has become a very
active research field over the last decade. Among others, Bayesian models have
shown some promising results on artificial examples but still lack of in situ
experiments. Our work applies state-of-the-art Bayesian models to unsupervised
Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also
show that Bayesian models can naturally integrate information from other
resourceful languages by means of informative prior leading to more consistent
discovered units. Finally, discovered acoustic units are used, either as the
1-best sequence or as a lattice, to perform word segmentation. Word
segmentation results show that this Bayesian approach clearly outperforms a
Segmental-DTW baseline on the same corpus.Comment: Accepted to ICASSP 201
Kosp2e: Korean Speech to English Translation Corpus
Most speech-to-text (S2T) translation studies use English speech as a source,
which makes it difficult for non-English speakers to take advantage of the S2T
technologies. For some languages, this problem was tackled through corpus
construction, but the farther linguistically from English or the more
under-resourced, this deficiency and underrepresentedness becomes more
significant. In this paper, we introduce kosp2e (read as `kospi'), a corpus
that allows Korean speech to be translated into English text in an end-to-end
manner. We adopt open license speech recognition corpus, translation corpus,
and spoken language corpora to make our dataset freely available to the public,
and check the performance through the pipeline and training-based approaches.
Using pipeline and various end-to-end schemes, we obtain the highest BLEU of
21.3 and 18.0 for each based on the English hypothesis, validating the
feasibility of our data. We plan to supplement annotations for other target
languages through community contributions in the future.Comment: Interspeech 2021 Camera-read
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop exploring
the computational and scientific issues surrounding the discovery of linguistic
units (subwords and words) in a language without orthography. We study the
replacement of orthographic transcriptions by images and/or translated text in
a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201
- âŠ