Search CORE

4 research outputs found

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Author: Gyires-Tóth Bálint
Németh Géza
Yolchuyeva Sevinj
Publication venue: 'MDPI AG'
Publication date: 01/03/2019
Field of study

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections, furthermore, a model, which utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Repository of the Academy's Library

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Structure Learning in Hidden Conditional Random Fields for Grapheme-to-Phoneme Conversion

Author: Francois Yvon
Hermann Ney
Patrick Lehnen
Re Allauzen
Stefan Hahn
Thomas Lavergne
Publication venue
Publication date: 01/01/2013
Field of study

Accurate grapheme-to-phoneme (g2p) conversion is needed for several speech processing applications, such as automatic speech synthesis and recognition. For some languages, notably English, improvements of g2p systems are very slow, due to the intricacy of the associations between letter and sounds. In recent years, several improvements have been obtained either by using variable-length associations in generative models (jointn-grams), or by recasting the problem as a conventional sequence labeling task, enabling to integrate rich dependencies in discriminative models. In this paper, we consider several ways to reconciliate these two approaches. Introducing hidden variable-length alignments through latent variables, our Hidden Conditional Random Field (HCRF) models are able to produce comparative performance compared to strong generative and discriminative models on the CELEX database

CiteSeerX

HAL Descartes

Publikationsserver der RWTH Aachen University