7,399 research outputs found

    MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

    Full text link
    This paper introduces a high-quality open-source text-to-speech (TTS) synthesis dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS applications in both academia and industry. In this paper, we share our experience by describing the dataset development procedures and faced challenges. To demonstrate the reliability of our dataset, we built a powerful non-autoregressive baseline system based on FastSpeech2 model and HiFi-GAN vocoder, and evaluated it using the subjective mean opinion score (MOS) and real time factor (RTF) metrics. Evaluation results show that the powerful baseline system trained on our dataset achieves MOS above 4 and RTF about 3.30×1013.30\times10^{-1}, which makes it applicable for practical use. The dataset, training recipe, and pretrained TTS models are freely available \footnote{\label{github}\url{https://github.com/walker-hyf/MnTTS}}.Comment: Accepted at the 2022 International Conference on Asian Language Processing (IALP2022

    Book Reviews

    Get PDF

    Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation

    Get PDF
    Deep learning techniques are currently being applied in automated text-to-speech (TTS) systems, resulting in significant improvements in performance. However, these methods require large amounts of text-speech paired data for model training, and collecting this data is costly. Therefore, in this paper, we propose a single-speaker TTS system containing both a spectrogram prediction network and a neural vocoder for the target language, using only 30 min of target language text-speech paired data for training. We evaluate three approaches for training the spectrogram prediction models of our TTS system, which produce mel-spectrograms from the input phoneme sequence: (1) cross-lingual transfer learning, (2) data augmentation, and (3) a combination of the previous two methods. In the cross-lingual transfer learning method, we used two high-resource language datasets, English (24 h) and Japanese (10 h). We also used 30 min of target language data for training in all three approaches, and for generating the augmented data used for training in methods 2 and 3. We found that using both cross-lingual transfer learning and augmented data during training resulted in the most natural synthesized target speech output. We also compare single-speaker and multi-speaker training methods, using sequential and simultaneous training, respectively. The multi-speaker models were found to be more effective for constructing a single-speaker, low-resource TTS model. In addition, we trained two Parallel WaveGAN (PWG) neural vocoders, one using 13 h of our augmented data with 30 min of target language data and one using the entire 12 h of the original target language dataset. Our subjective AB preference test indicated that the neural vocoder trained with augmented data achieved almost the same perceived speech quality as the vocoder trained with the entire target language dataset. Overall, we found that our proposed TTS system consisting of a spectrogram prediction network and a PWG neural vocoder was able to achieve reasonable performance using only 30 min of target language training data. We also found that by using 3 h of target language data, for training the model and for generating augmented data, our proposed TTS model was able to achieve performance very similar to that of the baseline model, which was trained with 12 h of target language data

    Globalising assessment: an ethnography of literacy assessment, camels and fast food in the Mongolian Gobi

    Get PDF
    What happens when standardised literacy assessments travel globally? The paper presents an ethnographic account of adult literacy assessment events in rural Mongolia. It examines the dynamics of literacy assessment in terms of the movement and re-contextualisation of test items as they travel globally and are received locally by Mongolian respondents. The analysis of literacy assessment events is informed by Goodwin’s ‘participation framework’ on language as embodied and situated interactive phenomena and by Actor Network Theory. Actor Network Theory (ANT) is applied to examine literacy assessment events as processes of translation shaped by an ‘assemblage’ of human and non-human actors (including the assessment texts)

    A tribute to Elizaveta Ubryatova: professional life and personal destiny

    Full text link
    The article was submitted on 10.06.2015. Translated by Dr. Lilia Gorelova.In Russia, the name of prominent turkologist Elizaveta Ivanovna Ubryatova, at present is known mostly to specialists who study the languages spoken by the Northern peoples of the country. However, the essence of scientific research of a linguist of such a calibre includes naturally attentive and concerned attitude to the fate of the peoples residing in the North of Russia, which was especially important in the conditions of the Soviet era. Survival of the Northern peoples and their languages became for Ubryatova not only a scientific problem but also a mission of vital importance. Ubryatova’s scientific interests were not restricted to linguistic problems, she also purposefully studied the important monuments of folk literature and ethnography of indigenous peoples. This was due to her scientific breadth, social responsibility, and commitment to a supreme mastery of the research object. That is why she became the founder of the original linguistic and cultural school in the study of the history and structures of languages spoken by peoples living in the North of Russia. The scale of her bright personality, combined with her intelligence, patience, and feminine care about colleagues and students, made her a center of attraction for researchers in this field. She launched an extensive project of publishing works devoted to folklore of the peoples who inhabited the Northern territories of Russia, and whose traditional culture became a part of the world culture as a result. The languages of the Dolgans and Yakuts became the main topics of her research. In this article, we outline the major ideas proposed by Ubryatova in her works, viz., those concerning the origin of the Turkic languages, Dolgan and Yakut in particular, and principles of the organization of the Yakut syntax. In her works, devoted to syntactic problems, Ubryatova determined the fundamental characteristic features of systemic organization of Turkic languages, Yakut in particular, as the ability of these languages to link language units of different levels between each other by using the same grammatical means. In Turkic languages, almost all syntactic relations between clauses can be expressed grammatically, and this linguistic phenomenon entails the existence of a diverse and advanced system of non-finite verbal forms. These important findings can be successfully generalised to embrace all Altaic languages. Addressing a linguistic problem, Ubryatova combined her deep intuition with intensive field work and systematic theoretic investigation. Monographs and textbooks written by Ubryatova belong to the gold reserve of Turkology and cultural linguistics.Среди российских лингвистов имя Елизаветы Ивановны Убрятовой известно преимущественно специалистам, изучающим языки народов Севера. Однако суть научных изысканий лингвистов этого профиля закономерно включала в себя неравнодушное отношение к судьбе народов Севера в условиях советского времени, что для Убрятовой было не только научной, но и жизненной задачей. Елизавета Ивановна, путь которой отчасти случайно пересекся с исследованием не только языка, но и памятников фольклорной словесности и этнографии, отнюдь не случайно, а в результате научной добросовестности и стремления к доскональному знанию объекта исследования становится основателем оригинального лингвокультурологического направления в изучении истории и структуры языков народов, живущих на севере России. Масштаб личности, яркая индивидуальность в сочетании с настоящей интеллигентностью, терпением и женской заботой об окружающих сотрудниках и учениках сделали ее центром притяжения, позволили сформировать школу, запланировать и осуществить величественный проект издания фольклора народов Севера, где самобытная культура этой территории стала частью мировой культуры. Главными в ее исследовании стали тюркские языки - долганский и якутский. В статье прописаны основные тезисы работ Убрятовой по темам «Происхождение тюркских языков» и «Принципы организации синтаксиса якутского языка». В трудах по синтаксису Елизавета Убрятова определяет особенность системной организации тюркских языков, и якутского в частности, как способность единиц разного уровня соединяться при помощи одних и тех же средств. Идея Убрятовой о том, что почти все синтаксические отношения имеют грамматические выражения, что и определяет систему глагольных форм, как показано в статье, оказалась принципиально важна не только для тюркологии. В ее лингвистических изысканиях тонкая интуиция сочеталась с огромной полевой работой и системными исследованиями каждого вопроса. Научные сочинения Убрятовой составляют золотой фонд тюркологии

    Is literary language a development of ordinary language?

    Get PDF
    Contemporary literary linguistics is guided by the 'Development Hypothesis' which says that literary language is formed and regulated by developing only the elements, rules and constraints of ordinary language. Six ways of differentiating literary language from ordinary language are tested against the Development Hypothesis, as are various kinds of superadded constraint including metre, rhyme and alliteration and parallelism. Literary language differs formally, but is unlikely to differ semantically from ordinary language. The article concludes by asking why the Development Hypothesis might hold
    corecore