Search CORE

1,295 research outputs found

Automatic Pronunciation Assessment -- A Review

Author: Ali Ahmed
Chowdhury Shammur Absar
Kheir Yassine El
Publication venue
Publication date: 21/10/2023
Field of study

Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

arXiv.org e-Print Archive

An articulatory-functional approach to modeling Persian focus prosody

Author: Taheri-Ardali M
Xu Y
Publication venue: The 18th International Congress of Phonetic Sciences
Publication date: 14/08/2015
Field of study

This paper is an attempt to test PENTA, an articulatory-functional model, on Persian focus prosody. The test was done on a corpus consisting of utterances with different focus conditions using PENTAtrainer2, a trainable prosody synthesizer that optimizes categorical pitch targets each corresponding to multiple communicative functions. The evaluation was done by comparing the F0 contours generated by the extracted pitch targets to those of natural utterances through numerical and perceptual evaluations. The numerical results showed that the synthesized F0 was close to the natural contour in terms of RMSE (= 1.94) and Pearson’s r (= 0.84). Perceptual evaluation showed that the rate of focus identification and naturalness judgement by native Persian listeners were highly similar between synthetic and natural F0 contours

UCL Discovery

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

CAPT를 위한 발음 변이 분석 및 CycleGAN 기반 피드백 생성

Author: 양승희
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :인문대학 협동과정 인지과학전공,2020. 2. 정민화.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies. This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system. The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.외국어로서의 한국어 교육에 대한 관심이 고조되어 한국어 학습자의 수가 크게 증가하고 있으며, 음성언어처리 기술을 적용한 컴퓨터 기반 발음 교육(Computer-Assisted Pronunciation Training; CAPT) 어플리케이션에 대한 연구 또한 적극적으로 이루어지고 있다. 그럼에도 불구하고 현존하는 한국어 말하기 교육 시스템은 외국인의 한국어에 대한 언어학적 특징을 충분히 활용하지 않고 있으며, 최신 언어처리 기술 또한 적용되지 않고 있는 실정이다. 가능한 원인으로써는 외국인 발화 한국어 현상에 대한 분석이 충분하게 이루어지지 않았다는 점, 그리고 관련 연구가 있어도 이를 자동화된 시스템에 반영하기에는 고도화된 연구가 필요하다는 점이 있다. 뿐만 아니라 CAPT 기술 전반적으로는 신호처리, 운율 분석, 자연어처리 기법과 같은 특징 추출에 의존하고 있어서 적합한 특징을 찾고 이를 정확하게 추출하는 데에 많은 시간과 노력이 필요한 실정이다. 이는 최신 딥러닝 기반 언어처리 기술을 활용함으로써 이 과정 또한 발전의 여지가 많다는 바를 시사한다. 따라서 본 연구는 먼저 CAPT 시스템 개발에 있어 발음 변이 양상과 언어학적 상관관계를 분석하였다. 외국인 화자들의 낭독체 변이 양상과 한국어 원어민 화자들의 낭독체 변이 양상을 대조하고 주요한 변이를 확인한 후, 상관관계 분석을 통하여 의사소통에 영향을 미치는 중요도를 파악하였다. 그 결과, 종성 삭제와 3중 대립의 혼동, 초분절 관련 오류가 발생할 경우 피드백 생성에 우선적으로 반영하는 것이 필요하다는 것이 확인되었다. 교정된 피드백을 자동으로 생성하는 것은 CAPT 시스템의 중요한 과제 중 하나이다. 본 연구는 이 과제가 발화의 스타일 변화의 문제로 해석이 가능하다고 보았으며, 생성적 적대 신경망 (Cycle-consistent Generative Adversarial Network; CycleGAN) 구조에서 모델링하는 것을 제안하였다. GAN 네트워크의 생성모델은 비원어민 발화의 분포와 원어민 발화 분포의 매핑을 학습하며, Cycle consistency 손실함수를 사용함으로써 발화간 전반적인 구조를 유지함과 동시에 과도한 교정을 방지하였다. 별도의 특징 추출 과정이 없이 필요한 특징들이 CycleGAN 프레임워크에서 무감독 방법으로 스스로 학습되는 방법으로, 언어 확장이 용이한 방법이다. 언어학적 분석에서 드러난 주요한 변이들 간의 우선순위는 Auxiliary Classifier CycleGAN 구조에서 모델링하는 것을 제안하였다. 이 방법은 기존의 CycleGAN에 지식을 접목시켜 피드백 음성을 생성함과 동시에 해당 피드백이 어떤 유형의 오류인지 분류하는 문제를 수행한다. 이는 도메인 지식이 교정 피드백 생성 단계까지 유지되고 통제가 가능하다는 장점이 있다는 데에 그 의의가 있다. 본 연구에서 제안한 방법을 평가하기 위해서 27개의 모국어를 갖는 217명의 유의미 어휘 발화 65,100개로 피드백 자동 생성 모델을 훈련하고, 개선 여부 및 정도에 대한 지각 평가를 수행하였다. 제안된 방법을 사용하였을 때 학습자 본인의 목소리를 유지한 채 교정된 발음으로 변환하는 것이 가능하며, 전통적인 방법인 음높이 동기식 중첩가산 (Pitch-Synchronous Overlap-and-Add) 알고리즘을 사용하는 방법에 비해 상대 개선률 16.67%이 확인되었다.Chapter 1. Introduction 1 1.1. Motivation 1 1.1.1. An Overview of CAPT Systems 3 1.1.2. Survey of existing Korean CAPT Systems 5 1.2. Problem Statement 7 1.3. Thesis Structure 7 Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9 2.1. Comparison between Korean and Chinese 11 2.1.1. Phonetic and Syllable Structure Comparisons 11 2.1.2. Phonological Comparisons 14 2.2. Related Works 16 2.3. Proposed Analysis Method 19 2.3.1. Corpus 19 2.3.2. Transcribers and Agreement Rates 22 2.4. Salient Pronunciation Variations 22 2.4.1. Segmental Variation Patterns 22 2.4.1.1. Discussions 25 2.4.2. Phonological Variation Patterns 26 2.4.1.2. Discussions 27 2.5. Summary 29 Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30 3.1. Related Works 31 3.1.1. Criteria used in L2 Speech 31 3.1.2. Criteria used in L2 Korean Speech 32 3.2. Proposed Human Evaluation Method 36 3.2.1. Reading Prompt Design 36 3.2.2. Evaluation Criteria Design 37 3.2.3. Raters and Agreement Rates 40 3.3. Linguistic Factors Affecting L2 Korean Accentedness 41 3.3.1. Pearsons Correlation Analysis 41 3.3.2. Discussions 42 3.3.3. Implications for Automatic Feedback Generation 44 3.4. Summary 45 Chapter 4. Corrective Feedback Generation for CAPT 46 4.1. Related Works 46 4.1.1. Prosody Transplantation 47 4.1.2. Recent Speech Conversion Methods 49 4.1.3. Evaluation of Corrective Feedback 50 4.2. Proposed Method: Corrective Feedback as a Style Transfer 51 4.2.1. Speech Analysis at Spectral Domain 53 4.2.2. Self-imitative Learning 55 4.2.3. An Analogy: CAPT System and GAN Architecture 57 4.3. Generative Adversarial Networks 59 4.3.1. Conditional GAN 61 4.3.2. CycleGAN 62 4.4. Experiment 63 4.4.1. Corpus 64 4.4.2. Baseline Implementation 65 4.4.3. Adversarial Training Implementation 65 4.4.4. Spectrogram-to-Spectrogram Training 66 4.5. Results and Evaluation 69 4.5.1. Spectrogram Generation Results 69 4.5.2. Perceptual Evaluation 70 4.5.3. Discussions 72 4.6. Summary 74 Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75 5.1. Linguistic Class Selection 75 5.2. Auxiliary Classifier CycleGAN Design 77 5.3. Experiment and Results 80 5.3.1. Corpus 80 5.3.2. Feature Annotations 81 5.3.3. Experiment Setup 81 5.3.4. Results 82 5.4. Summary 84 Chapter 6. Conclusion 86 6.1. Thesis Results 86 6.2. Thesis Contributions 88 6.3. Recommendations for Future Work 89 Bibliography 91 Appendix 107 Abstract in Korean 117 Acknowledgments 120Docto

SNU Open Repository and Archive

From communicative functions to prosodic forms

Author: Kohler Klaus J
Publication venue: 'S. Karger AG'
Publication date: 01/01/2013
Field of study

This is a proposal in favour of proceeding from communicative function to linguistic form, rather than the reverse, for an insightful account of how humans communicate by speech in languages. A functional framework is developed that encompasses argumentation structures, declarative and interrogative functions, and expressive intensification. Such a function orientation can become a powerful tool in comparative prosodic research across the world's languages. The potential of this approach is shown by comparing the prosodic form of Mandarin Chinese data collected in functionally contextualized scenarios with corresponding data from English and German

MACAU: Open Access Repository of Kiel University

Model-based Parametric Prosody Synthesis with Deep Neural Network

Author: Liu H
Lu H
Shao X
Xu Y
Publication venue: Interspeech 2016
Publication date: 01/09/2016
Field of study

Conventional statistical parametric speech synthesis (SPSS) captures only frame-wise acoustic observations and computes probability densities at HMM state level to obtain statistical acoustic models combined with decision trees, which is therefore a purely statistical data-driven approach without explicit integration of any articulatory mechanisms found in speech production research. The present study explores an alternative paradigm, namely, model-based parametric prosody synthesis (MPPS), which integrates dynamic mechanisms of human speech production as a core component of F0 generation. In this paradigm, contextual variations in prosody are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. Here the motor model is target approximation (TA), which generates syllable-sized F0 contours with only three motor parameters that are associated to linguistic functions. In this study, we simulate this two-stage process by linking the TA model to a deep neural network (DNN), which learns the “linguistic-motor” mapping given the “motor-acoustic” mapping provided by TA-based syllable-wise F0 production. The proposed prosody modeling system outperforms the HMM-based baseline system in both objective and subjective evaluations

UCL Discovery

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

Author: Cheng Ning
Sun Aolan
Wang Jianzong
Xiao Jing
Zhang Xulong
Publication venue
Publication date: 15/09/2023
Field of study

This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and input to the alignment and flow-based decoding modules to generate the raw audio waveform. The model is experimented on two languages, English and Mandarin, using single-speaker, few samples of target speakers, and multi-speaker datasets, respectively. Experimental results show better prosodic consistency performance between input text and generated audio, and also get higher scores in the subjective prosodic evaluation, and show the ability of voice conversion. Besides, the efficiency of the model is largely boosted through the design of the AI chip operator with 5x acceleration.Comment: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023

arXiv.org e-Print Archive

Generation of prosody and speech for Mandarin Chinese

Author: DONG MINGHUI
Publication venue
Publication date: 19/02/2004
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS