Search CORE

657 research outputs found

Automatic syllabification using segmental conditional random fields

Author: Demuynck Kris
Rogova Kseniya
Van Compernolle Dirk
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we present a statistical approach for the automatic syllabification of phonetic word transcriptions. A syllable bigram language model forms the core of the system. Given the large number of syllables in non-syllabic languages, sparsity is the main issue, especially since the available syllabified corpora tend to be small. Traditional back-off mechanisms only give a partial solution to the sparsity problem. In this work we use a set of features for back-off purposes: on the one hand probabilities such as consonant cluster probabilities, and on the other hand a set of rules based on generic syllabification principles such as legality, sonority and maximal onset. For the combination of these highly correlated features with the baseline bigram feature we employ segmental conditional random fields (SCRFs) as statistical framework. The resulting method is very versatile and can be used for any amount of data of any language. The method was tested on various datasets in English and Dutch with dictionary sizes varying between 1 and 60 thousand words. We obtained a 97.96% word accuracy for supervised syllabification and a 91.22% word accuracy for unsupervised syllabification for English. When including the top-2 generated syllabifications for a small fraction of the words, virtual perfect syllabification is obtained in supervised mode

Ghent University Academic Bibliography

ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

Author: TIN HTAY HLAING
Publication venue: 教授（主査）三上喜貴、教授　湯川高志、准教授　武井由智、准教授　山本和英、東京外国語大学教授　町田和彦
Publication date: 31/08/2014
Field of study

国立大学法人長岡技術科学大

Nagaoka University of Technology Institutional Repository

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Author: Tits Noé
Publication venue
Publication date: 17/10/2023
Field of study

In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification (in text and phonetic domains). The system was built with open-source components and resources. Through an ablation study, we demonstrate the efficacy of our approach in automatically syllabifying words from several languages (English, French and Spanish). Additionally, we apply the technique to the transcriptions of the CMU ARCTIC dataset, generating valuable annotations available online\footnote{\url{https://github.com/noetits/MUST_P-SRL}} that are ideal for speech representation learning, speech unit discovery, and disentanglement of speech factors in several speech-related fields.Comment: Accepted for publication at EMNLP 202

arXiv.org e-Print Archive

Comparison between rule-based and data-driven natural language processing algorithms for Brazilian Portuguese speech synthesis

Author: Vecchietti Luiz Felipe Santos
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/04/2017
Field of study

Due to the exponential growth in the use of computers, personal digital assistants and smartphones, the development of Text-to-Speech (TTS) systems have become highly demanded during the last years. An important part of these systems is the Text Analysis block, that converts the input text into linguistic specifications that are going to be used to generate the final speech waveform. The Natural Language Processing algorithms presented in this block are crucial to the quality of the speech generated by synthesizers. These algorithms are responsible for important tasks such as Grapheme-to-Phoneme Conversion, Syllabification and Stress Determination. For Brazilian Portuguese (BP), solutions for the algorithms presented in the Text Analysis block have been focused in rule-based approaches. These algorithms perform well for BP but have many disadvantages. On the other hand, there is still no research to evaluate and analyze the performance of data-driven approaches that reach state-of-the-art results for complex languages, such as English. So, in this work, we compare different data-driven approaches and rule-based approaches for NLP algorithms presented in a TTS system. Moreover, we propose, as a novel application, the use of Sequence-to-Sequence models as solution for the Syllabification and Stress Determination problems. As a brief summary of the results obtained, we show that data-driven algorithms can achieve state-of-the-art performance for the NLP algorithms presented in the Text Analysis block of a BP TTS system.Nos últimos anos, devido ao grande crescimento no uso de computadores, assistentes pessoais e smartphones, o desenvolvimento de sistemas capazes de converter texto em fala tem sido bastante demandado. O bloco de análise de texto, onde o texto de entrada é convertido em especificações linguísticas usadas para gerar a onda sonora final é uma parte importante destes sistemas. O desempenho dos algoritmos de Processamento de Linguagem Natural (NLP) presentes neste bloco é crucial para a qualidade dos sintetizadores de voz. Conversão Grafema-Fonema, separação silábica e determinação da sílaba tônica são algumas das tarefas executadas por estes algoritmos. Para o Português Brasileiro (BP), os algoritmos baseados em regras têm sido o foco na solução destes problemas. Estes algoritmos atingem bom desempenho para o BP, contudo apresentam diversas desvantagens. Por outro lado, ainda não há pesquisa no intuito de avaliar o desempenho de algoritmos data-driven, largamente utilizados para línguas complexas, como o inglês. Desta forma, expõe-se neste trabalho uma comparação entre diferentes técnicas data-driven e baseadas em regras para algoritmos de NLP utilizados em um sintetizador de voz. Além disso, propõe o uso de Sequence-to-Sequence models para a separação silábica e a determinação da tonicidade. Em suma, o presente trabalho demonstra que o uso de algoritmos data-driven atinge o estado-da-arte na performance dos algoritmos de Processamento de Linguagem Natural de um sintetizador de voz para o Português Brasileiro

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon

VOICECONET: A Collaborative Framework for Speech-Based Computer Accessibility with a Case Study for Brazilian Portuguese

Author: Aldebaro Klautau
Nelson Neto
Pedro Batista
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

IntechOpen

Phonetics of segmental FO and machine recognition of Korean speech

Author: Jang Tae-Yeoub
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

Data preparation and improvement of NLP software modules for parametric speech synthesis

Author
Publication venue
Publication date
Field of study

Padua Thesis and Dissertation Archive

A Review on Multilingual Text to Speech Synthesis by Syllabifying the Words of Devanagari and Roman

Author: Mr. Shankar B Chaudhari, Dr. S. K. Yadav
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/11/2014
Field of study

Speech synthesis is process of spoken language as an input text and converted into speech waveforms. This paper describes the text to speech system for Devanagari scripted language and Roman Language. There are many earliest TTS systems are available but for Devanagari and Roman scripts are not available

International Journal on Recent and Innovation Trends in Computing and Communication