Search CORE

29 research outputs found

Articulatory feature classification using convolutional neural networks

Author: Merkx D.
Scharenborg O.
Publication venue: 'International Speech Communication Association'
Publication date: 01/10/2018
Field of study

The ultimate goal of our research is to improve an existing speech-based computational model of human speech recognition on the task of simulating the role of fine-grained phonetic information in human speech processing. As part of this work we are investigating articulatory feature classifiers that are able to create reliable and accurate transcriptions of the articulatory behaviour encoded in the acoustic speech signal. Articulatory feature (AF) modelling of speech has received a considerable amount of attention in automatic speech recognition research. Different approaches have been used to build AF classifiers, most notably multi-layer perceptrons. Recently, deep neural networks have been applied to the task of AF classification. This paper aims to improve AF classification by investigating two different approaches: 1) investigating the usefulness of a deep Convolutional neural network (CNN) for AF classification; 2) integrating the Mel filtering operation into the CNN architecture. The results showed a remarkable improvement in classification accuracy of the CNNs over state-of-the-art AF classification results for Dutch, most notably in the minority classes. Integrating the Mel filtering operation into the CNN architecture did not further improve classification performance

Crossref

MPG.PuRe

Whispery Speech Recognition using Adapted Articulatory Features

Author: Jou Szu-Chen (Stan)
Schultz Tanja
Waibel Alex
Publication venue
Publication date: 16/06/2008
Field of study

KITopen

Articulatory grounding of phonemic distinctions in English by means of electropalatography

Author: Krynicki Grzegorz
Publication venue
Publication date: 01/01/2014
Field of study

The aim of the experiment described in this paper was to devise and test a procedure that would allow identification of a phoneme on the basis of only tongue-to-palate and labial contacts that accompanied its realization in continuous read speech. The hypothesis underlying this study was that the articulatory correlates of the phonemic distinctive features can be induced statistically from dimensionality-reduced electropalatographic data.29931

Adam Mickiewicz University Repository

Repozytorium Uniwersytetu im. Adama Mickiewicza (AMUR)

Speech recognition via phonetically-featured syllables

Author: Frankel Joe
King Simon
Richmond Korin
Taylor Paul
Publication venue: University of the Saarland
Publication date: 01/01/2000
Field of study

We describe recent work on two new automatic speech recognition systems. The first part of this paper describes the components of a system based on phonological features (which we call Espresso-P) in which the values of these features are estimated from the speech signal before being used as the basis for recognition. In the second part of the paper, another system (which we call Espresso-A) is described in which articulatory parameters are used instead of phonological features and a linear dynamical system model is used to perform recognition from automatically estimated values of these articulatory parameters

CiteSeerX

Edinburgh Research Archive

A review of Yorùbá Automatic Speech Recognition

Author: Atanda Abdulwahab F.
Hariharan M.
Mohd Yusof Shahrul Azmi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Automatic Speech Recognition (ASR) has recorded appreciable progress both in technology and application.Despite this progress, there still exist wide performance gap between human speech recognition (HSR) and ASR which has inhibited its full adoption in real life situation.A brief review of research progress on Yorùbá Automatic Speech Recognition (ASR) is presented in this paper focusing of variability as factor contributing to performance gap between HSR and ASR with a view of x-raying the advances recorded, major obstacles, and chart a way forward for development of ASR for Yorùbá that is comparable to those of other tone languages and of developed nations.This is done through extensive surveys of literatures on ASR with focus on Yorùbá.Though appreciable progress has been recorded in advancement of ASR in the developed world, reverse is the case for most of the developing nations especially those of Africa.Yorùbá like most of languages in Africa lacks both human and materials resources needed for the development of functional ASR system much less taking advantage of its potentials benefits. Results reveal that attaining an ultimate goal of ASR performance comparable to human level requires deep understanding of variability factors

UUM Repository

Crossref

Rethinking classification results based on read speech, or: why improvements do not always transfer to other speaking styles

Author: A Field
A Juneja
A Salomon
AM Abdelatti Ali
B Schölkopf
Barbara Schuppler
C Cortes
CY Espy-Wilson
DMW Powers
F Metze
F Pernkopf
J Frankel
JM Kessens
K Johnson
K Kirchhoff
K Manjunath
KJ Kohler
M Saraçlar
O Scharenborg
O Scharenborg
P Niyogi
R Ogden
S Chang
S Greenberg
S King
S King
SM Siniscalchi
T Pruthi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

TUGraz OPEN Library

Speaker Independent Acoustic-to-Articulatory Inversion

Author: Ji An
Publication venue: e-Publications@Marquette
Publication date: 01/10/2014
Field of study

Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

epublications@Marquette