Search CORE

32 research outputs found

Grapheme-to-Phoneme Conversion with Convolutional Neural Networks

Author: Gyires-Tóth Bálint
Németh Géza
Yolchuyeva Sevinj
Publication venue: 'MDPI AG'
Publication date: 01/03/2019
Field of study

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections, furthermore, a model, which utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Repository of the Academy's Library

DNN-based speaker clustering for speaker diarisation

Author: Hain T.
Milner R.
Publication venue: 'International Speech Communication Association'
Publication date: 08/09/2016
Field of study

Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and nonspeech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%

Crossref

White Rose Research Online

Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments

Author: Du Mingxing
Dupoux Emmanuel
Holzenberger Nils
Karadayi Julien
Riad Rachid
Publication venue: 'International Speech Communication Association'
Publication date: 02/09/2018
Field of study

International audienceFixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can outperform the variable-length input feature representation on both evaluations. Recurrent autoencoders, trained without supervision, can yield even better results at the expense of increased computational complexity

INRIA a CCSD electronic archive server

A framework for using humanoid robots in the school learning environment

Author: Lugo Ricardo Gregorio
Mishra Deepti
Parish Karen
Wang Hao
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

With predictions of robotics and efficient machine learning being the building blocks of the Fourth Industrial Revolution, countries need to adopt a long-term strategy to deal with potential challenges of automation and education must be at the center of this long-term strategy. Education must provide students with a grounding in certain skills, such as computational thinking and an understanding of robotics, which are likely to be required in many future roles. Targeting an acknowledged gap in existing humanoid robot research in the school learning environment, we present a multidisciplinary framework that integrates the following four perspectives: technological, pedagogical, efficacy of humanoid robots and a consideration of the ethical implications of using humanoid robots. Further, this paper presents a proposed application, evaluation and a case study of how the framework can be used.publishedVersio

Brage INN