2 research outputs found

    Dataset Suara dan Teks Berbahasa Indonesia Pada Rekaman Podcast dan Talk show

    Get PDF
    Salah satu faktor keberhasilan suatu model pembelajaran dalam machine learning atau deep learning adalah dataset yang digunakan. Pada tulisan ini menyajikan dataset suara dari rekaman podcast dan talk show beserta transkripsi berbahasa Indonesia. Dataset ini disajikan karena belum adanya ketersediaan dataset berbahasa Indonesia yang dapat diakses secara publik untuk digunakan pada pembelajaran model Text-to-Speech ataupun Audio Speech Recognition. Dataset terdiri dari 3270 rekaman yang diproses untuk mendapatkan transkripsi berupa teks atau kalimat berbahasa Indonesia. Dalam pembuatan dataset ini dilakukan beberapa tahapan seperti pra-pemrosesan, tahapan translasi, tahapan validasi pertama dan tahapan validasi kedua. Dataset dibuat dengan format yang mengikuti format dari dataset LJSpeech untuk memudahkan pemrosesan dataset ketika digunakan dalam suatu model sebagai input. Dataset ini diharapkan dapat membantu meningkatkan kualitas pembelajaran untuk pemrosesan Text-to-Speech seperti pada model Tacotron2 ataupun pada pemrosesan Audio Speech Recognition untuk bahasa Indonesia

    A Realistic Visual Speech Synthesis for Indonesian Using A Combination of Morphing Viseme and Syllable Concatenation Approach to Support Pronunciation Learning

    No full text
    This study aims to build a realistic visual speech synthesis for Indonesian so that it can be used to learn Indonesian pronunciation. In this study, We used the combination of morphing viseme and syllable concatenation method. The morphing viseme method is a process of deformation from one viseme to another so that the animation of the mouth shape looks smoother. This method is used to create the transition of animation between viseme. The Syllable Concatenation method is used to assemble viseme based on certain syllable patterns. We built a syllable-based voice database as a basis for synchronization between syllables, speech and viseme models. The method proposed in this study consists of several stages, namely the formation of Indonesian viseme models, designing facial animation character, development of speech database, a synchronization process and subjective testing of the resulting application. Subjective tests were conducted on 30 respondents who assessed the suitability and natural movement of the mouth when uttering the Indonesian texts. The MOS (Mean Opinion Score) method is used to calculate the average of respondents' scores. The MOS calculation results for the criteria of Synchronization and naturalness are 4,283 and 4,107 on the scale of 1 to 5. This result shows that the level of Synchronization and naturalness of the synthesis of visual speech is more realistic. Therefore, the system can display the visualization of phoneme pronunciation to support learning Indonesian pronunciation
    corecore