15 research outputs found
Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents
Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an ANN with a small dataset to accurately classify whether Filipino call center agentsโ pronunciations are neutral or not based on their employerโs standards. Isolated utterances of the ten most commonly used words in the call center were recorded from eleven agents creating a dataset of 110 utterances. Two learning specialists were consulted to establish ground truths and Cohenโs Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation. Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score of 0.92
Artificial Neural Network (ANN) in a Small Dataset to determine Neutrality in the Pronunciation of English as a Foreign Language in Filipino Call Center Agents
Artificial Neural Networks (ANNs) have continued to be efficient models in solving classification problems. In this paper, we explore the use of an A NN with a small dataset to accurately classify whet her Filipino call center agentsโ pronunciations are neutral or not based on their employerโs standards. Isolated utterances of the
ten most commonly used words in the call center were recorded from eleven agents creating a dataset of
110 utterances. Two learning specialists were consulted to establish ground truths and Cohenโs Kappa was computed as 0.82, validating the reliability of the dataset. The first thirteen Mel-Frequency Cepstral Coefficients (MFCCs) were then extracted from each word and an ANN was trained with Ten-fold Stratified Cross Validation.
Experimental results on the model recorded a classification accuracy of 89.60% supported by an overall F-Score
of 0.92
L2-ARCTIC: A Non-Native English Speech Corpus
In this paper, we introduce L2-ARCTIC, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection. This initial release includes recordings from ten non-native speakers of English whose first languages (L1s) are Hindi, Korean, Mandarin, Spanish, and Arabic, each L1 containing recordings from one male and one female speaker. Each speaker recorded approximately one hour of read speech from the Carnegie Mellon University ARCTIC prompts, from which we generated orthographic and forced-aligned phonetic transcriptions. In addition, we manually annotated 150 utterances per speaker to identify three types of mispronunciation errors: substitutions, deletions, and additions, making it a valuable resource not only for research in voice conversion and accent conversion but also in computer-assisted pronunciation training. The corpus is publicly accessible at https://psi.engr.tamu.edu/l2-arctic-corpus/
HMM-based Non-native Accent Assessment using Posterior Features
Automatic non-native accent assessment has many potential benefits in language learning and speech technologies. The three fundamental challenges in automatic accent assessment are to characterize, model and assess individual variation in speech of the non-native speaker. In our recent work, accentedness score was automatically obtained by comparing two phone probability sequences obtained through instances of non-native and native speech. In this paper, we build on the previous work and obtain the native latent symbol probability sequence through the word hypothesis modeled as a hidden Markov model (HMM). The approach overcomes the necessity for a native human reference speech of the same sentence. Using the HMMs trained on an auxiliary native speech corpus, the proposed approach achieves a correlation of 0.68 with the human accent ratings on the ISLE corpus. This is further interesting considering that the approach does not use any non-native data and human accent ratings at any stage of the system development
Directions for the future of technology in pronunciation research and teaching
This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2) pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider how these insights pave the way for researchers and developers working to create research-informed, computer-assisted pronunciation teaching resources. We conclude with predictions for future developments
CAPT๋ฅผ ์ํ ๋ฐ์ ๋ณ์ด ๋ถ์ ๋ฐ CycleGAN ๊ธฐ๋ฐ ํผ๋๋ฐฑ ์์ฑ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :์ธ๋ฌธ๋ํ ํ๋๊ณผ์ ์ธ์ง๊ณผํ์ ๊ณต,2020. 2. ์ ๋ฏผํ.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies.
This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system.
The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.์ธ๊ตญ์ด๋ก์์ ํ๊ตญ์ด ๊ต์ก์ ๋ํ ๊ด์ฌ์ด ๊ณ ์กฐ๋์ด ํ๊ตญ์ด ํ์ต์์ ์๊ฐ ํฌ๊ฒ ์ฆ๊ฐํ๊ณ ์์ผ๋ฉฐ, ์์ฑ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ์ ์ฉํ ์ปดํจํฐ ๊ธฐ๋ฐ ๋ฐ์ ๊ต์ก(Computer-Assisted Pronunciation Training; CAPT) ์ดํ๋ฆฌ์ผ์ด์
์ ๋ํ ์ฐ๊ตฌ ๋ํ ์ ๊ทน์ ์ผ๋ก ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ๊ทธ๋ผ์๋ ๋ถ๊ตฌํ๊ณ ํ์กดํ๋ ํ๊ตญ์ด ๋งํ๊ธฐ ๊ต์ก ์์คํ
์ ์ธ๊ตญ์ธ์ ํ๊ตญ์ด์ ๋ํ ์ธ์ดํ์ ํน์ง์ ์ถฉ๋ถํ ํ์ฉํ์ง ์๊ณ ์์ผ๋ฉฐ, ์ต์ ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ๋ํ ์ ์ฉ๋์ง ์๊ณ ์๋ ์ค์ ์ด๋ค. ๊ฐ๋ฅํ ์์ธ์ผ๋ก์จ๋ ์ธ๊ตญ์ธ ๋ฐํ ํ๊ตญ์ด ํ์์ ๋ํ ๋ถ์์ด ์ถฉ๋ถํ๊ฒ ์ด๋ฃจ์ด์ง์ง ์์๋ค๋ ์ , ๊ทธ๋ฆฌ๊ณ ๊ด๋ จ ์ฐ๊ตฌ๊ฐ ์์ด๋ ์ด๋ฅผ ์๋ํ๋ ์์คํ
์ ๋ฐ์ํ๊ธฐ์๋ ๊ณ ๋ํ๋ ์ฐ๊ตฌ๊ฐ ํ์ํ๋ค๋ ์ ์ด ์๋ค. ๋ฟ๋ง ์๋๋ผ CAPT ๊ธฐ์ ์ ๋ฐ์ ์ผ๋ก๋ ์ ํธ์ฒ๋ฆฌ, ์ด์จ ๋ถ์, ์์ฐ์ด์ฒ๋ฆฌ ๊ธฐ๋ฒ๊ณผ ๊ฐ์ ํน์ง ์ถ์ถ์ ์์กดํ๊ณ ์์ด์ ์ ํฉํ ํน์ง์ ์ฐพ๊ณ ์ด๋ฅผ ์ ํํ๊ฒ ์ถ์ถํ๋ ๋ฐ์ ๋ง์ ์๊ฐ๊ณผ ๋
ธ๋ ฅ์ด ํ์ํ ์ค์ ์ด๋ค. ์ด๋ ์ต์ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ํ์ฉํจ์ผ๋ก์จ ์ด ๊ณผ์ ๋ํ ๋ฐ์ ์ ์ฌ์ง๊ฐ ๋ง๋ค๋ ๋ฐ๋ฅผ ์์ฌํ๋ค.
๋ฐ๋ผ์ ๋ณธ ์ฐ๊ตฌ๋ ๋จผ์ CAPT ์์คํ
๊ฐ๋ฐ์ ์์ด ๋ฐ์ ๋ณ์ด ์์๊ณผ ์ธ์ดํ์ ์๊ด๊ด๊ณ๋ฅผ ๋ถ์ํ์๋ค. ์ธ๊ตญ์ธ ํ์๋ค์ ๋ญ๋
์ฒด ๋ณ์ด ์์๊ณผ ํ๊ตญ์ด ์์ด๋ฏผ ํ์๋ค์ ๋ญ๋
์ฒด ๋ณ์ด ์์์ ๋์กฐํ๊ณ ์ฃผ์ํ ๋ณ์ด๋ฅผ ํ์ธํ ํ, ์๊ด๊ด๊ณ ๋ถ์์ ํตํ์ฌ ์์ฌ์ํต์ ์ํฅ์ ๋ฏธ์น๋ ์ค์๋๋ฅผ ํ์
ํ์๋ค. ๊ทธ ๊ฒฐ๊ณผ, ์ข
์ฑ ์ญ์ ์ 3์ค ๋๋ฆฝ์ ํผ๋, ์ด๋ถ์ ๊ด๋ จ ์ค๋ฅ๊ฐ ๋ฐ์ํ ๊ฒฝ์ฐ ํผ๋๋ฐฑ ์์ฑ์ ์ฐ์ ์ ์ผ๋ก ๋ฐ์ํ๋ ๊ฒ์ด ํ์ํ๋ค๋ ๊ฒ์ด ํ์ธ๋์๋ค.
๊ต์ ๋ ํผ๋๋ฐฑ์ ์๋์ผ๋ก ์์ฑํ๋ ๊ฒ์ CAPT ์์คํ
์ ์ค์ํ ๊ณผ์ ์ค ํ๋์ด๋ค. ๋ณธ ์ฐ๊ตฌ๋ ์ด ๊ณผ์ ๊ฐ ๋ฐํ์ ์คํ์ผ ๋ณํ์ ๋ฌธ์ ๋ก ํด์์ด ๊ฐ๋ฅํ๋ค๊ณ ๋ณด์์ผ๋ฉฐ, ์์ฑ์ ์ ๋ ์ ๊ฒฝ๋ง (Cycle-consistent Generative Adversarial Network; CycleGAN) ๊ตฌ์กฐ์์ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ ์ ์ํ์๋ค. GAN ๋คํธ์ํฌ์ ์์ฑ๋ชจ๋ธ์ ๋น์์ด๋ฏผ ๋ฐํ์ ๋ถํฌ์ ์์ด๋ฏผ ๋ฐํ ๋ถํฌ์ ๋งคํ์ ํ์ตํ๋ฉฐ, Cycle consistency ์์คํจ์๋ฅผ ์ฌ์ฉํจ์ผ๋ก์จ ๋ฐํ๊ฐ ์ ๋ฐ์ ์ธ ๊ตฌ์กฐ๋ฅผ ์ ์งํจ๊ณผ ๋์์ ๊ณผ๋ํ ๊ต์ ์ ๋ฐฉ์งํ์๋ค. ๋ณ๋์ ํน์ง ์ถ์ถ ๊ณผ์ ์ด ์์ด ํ์ํ ํน์ง๋ค์ด CycleGAN ํ๋ ์์ํฌ์์ ๋ฌด๊ฐ๋
๋ฐฉ๋ฒ์ผ๋ก ์ค์ค๋ก ํ์ต๋๋ ๋ฐฉ๋ฒ์ผ๋ก, ์ธ์ด ํ์ฅ์ด ์ฉ์ดํ ๋ฐฉ๋ฒ์ด๋ค.
์ธ์ดํ์ ๋ถ์์์ ๋๋ฌ๋ ์ฃผ์ํ ๋ณ์ด๋ค ๊ฐ์ ์ฐ์ ์์๋ Auxiliary Classifier CycleGAN ๊ตฌ์กฐ์์ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ ์ ์ํ์๋ค. ์ด ๋ฐฉ๋ฒ์ ๊ธฐ์กด์ CycleGAN์ ์ง์์ ์ ๋ชฉ์์ผ ํผ๋๋ฐฑ ์์ฑ์ ์์ฑํจ๊ณผ ๋์์ ํด๋น ํผ๋๋ฐฑ์ด ์ด๋ค ์ ํ์ ์ค๋ฅ์ธ์ง ๋ถ๋ฅํ๋ ๋ฌธ์ ๋ฅผ ์ํํ๋ค. ์ด๋ ๋๋ฉ์ธ ์ง์์ด ๊ต์ ํผ๋๋ฐฑ ์์ฑ ๋จ๊ณ๊น์ง ์ ์ง๋๊ณ ํต์ ๊ฐ ๊ฐ๋ฅํ๋ค๋ ์ฅ์ ์ด ์๋ค๋ ๋ฐ์ ๊ทธ ์์๊ฐ ์๋ค.
๋ณธ ์ฐ๊ตฌ์์ ์ ์ํ ๋ฐฉ๋ฒ์ ํ๊ฐํ๊ธฐ ์ํด์ 27๊ฐ์ ๋ชจ๊ตญ์ด๋ฅผ ๊ฐ๋ 217๋ช
์ ์ ์๋ฏธ ์ดํ ๋ฐํ 65,100๊ฐ๋ก ํผ๋๋ฐฑ ์๋ ์์ฑ ๋ชจ๋ธ์ ํ๋ จํ๊ณ , ๊ฐ์ ์ฌ๋ถ ๋ฐ ์ ๋์ ๋ํ ์ง๊ฐ ํ๊ฐ๋ฅผ ์ํํ์๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ์์ ๋ ํ์ต์ ๋ณธ์ธ์ ๋ชฉ์๋ฆฌ๋ฅผ ์ ์งํ ์ฑ ๊ต์ ๋ ๋ฐ์์ผ๋ก ๋ณํํ๋ ๊ฒ์ด ๊ฐ๋ฅํ๋ฉฐ, ์ ํต์ ์ธ ๋ฐฉ๋ฒ์ธ ์๋์ด ๋๊ธฐ์ ์ค์ฒฉ๊ฐ์ฐ (Pitch-Synchronous Overlap-and-Add) ์๊ณ ๋ฆฌ์ฆ์ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋นํด ์๋ ๊ฐ์ ๋ฅ 16.67%์ด ํ์ธ๋์๋ค.Chapter 1. Introduction 1
1.1. Motivation 1
1.1.1. An Overview of CAPT Systems 3
1.1.2. Survey of existing Korean CAPT Systems 5
1.2. Problem Statement 7
1.3. Thesis Structure 7
Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9
2.1. Comparison between Korean and Chinese 11
2.1.1. Phonetic and Syllable Structure Comparisons 11
2.1.2. Phonological Comparisons 14
2.2. Related Works 16
2.3. Proposed Analysis Method 19
2.3.1. Corpus 19
2.3.2. Transcribers and Agreement Rates 22
2.4. Salient Pronunciation Variations 22
2.4.1. Segmental Variation Patterns 22
2.4.1.1. Discussions 25
2.4.2. Phonological Variation Patterns 26
2.4.1.2. Discussions 27
2.5. Summary 29
Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30
3.1. Related Works 31
3.1.1. Criteria used in L2 Speech 31
3.1.2. Criteria used in L2 Korean Speech 32
3.2. Proposed Human Evaluation Method 36
3.2.1. Reading Prompt Design 36
3.2.2. Evaluation Criteria Design 37
3.2.3. Raters and Agreement Rates 40
3.3. Linguistic Factors Affecting L2 Korean Accentedness 41
3.3.1. Pearsons Correlation Analysis 41
3.3.2. Discussions 42
3.3.3. Implications for Automatic Feedback Generation 44
3.4. Summary 45
Chapter 4. Corrective Feedback Generation for CAPT 46
4.1. Related Works 46
4.1.1. Prosody Transplantation 47
4.1.2. Recent Speech Conversion Methods 49
4.1.3. Evaluation of Corrective Feedback 50
4.2. Proposed Method: Corrective Feedback as a Style Transfer 51
4.2.1. Speech Analysis at Spectral Domain 53
4.2.2. Self-imitative Learning 55
4.2.3. An Analogy: CAPT System and GAN Architecture 57
4.3. Generative Adversarial Networks 59
4.3.1. Conditional GAN 61
4.3.2. CycleGAN 62
4.4. Experiment 63
4.4.1. Corpus 64
4.4.2. Baseline Implementation 65
4.4.3. Adversarial Training Implementation 65
4.4.4. Spectrogram-to-Spectrogram Training 66
4.5. Results and Evaluation 69
4.5.1. Spectrogram Generation Results 69
4.5.2. Perceptual Evaluation 70
4.5.3. Discussions 72
4.6. Summary 74
Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75
5.1. Linguistic Class Selection 75
5.2. Auxiliary Classifier CycleGAN Design 77
5.3. Experiment and Results 80
5.3.1. Corpus 80
5.3.2. Feature Annotations 81
5.3.3. Experiment Setup 81
5.3.4. Results 82
5.4. Summary 84
Chapter 6. Conclusion 86
6.1. Thesis Results 86
6.2. Thesis Contributions 88
6.3. Recommendations for Future Work 89
Bibliography 91
Appendix 107
Abstract in Korean 117
Acknowledgments 120Docto
Recommended from our members
Resource-Aware Predictive Models in Cyber-Physical Systems
Cyber-Physical Systems (CPS) are composed of computing devices interacting with physical systems. Model-based design is a powerful methodology in CPS design in the implementation of control systems. For instance, Model Predictive Control (MPC) is typically implemented in CPS applications, e.g., in path tracking of autonomous vehicles. MPC deploys a model to estimate the behavior of the physical system at future time instants for a specific time horizon. Ordinary Differential Equations (ODE) are the most commonly used models to emulate the behavior of continuous-time (non-)linear dynamical systems. A complex physical model may comprise thousands of ODEs that pose scalability, performance and power consumption challenges. One approach to address these model complexity challenges are frameworks that automate the development of model-to-model transformation. In this dissertation, a state-based model with tunable parameters is proposed to operate as a reconfigurable predictive model of the physical system. Moreover, we propose a run-time switching algorithm that selects the best model using machine learning. We employed a metric that formulates the trade-off between the error and computational savings due to model reduction. Building statistical models are constrained to having expert knowledge and an actual understanding of the modeled phenomenon or process. Also, statistical models may not produce solutions that are as robust in a real-world context as factors outside the model, like disruptions would not be taken into account. Machine learning models have emerged as a solution to account for the dynamic behavior of the environment and automate intelligence acquisition and refinement. Neural networks are machine learning models, well-known to have the ability to learn linear and nonlinear relations between input and output variables without prior knowledge. However, the ability to efficiently exploit resource-hungry neural networks in embedded resource-bound settings is a major challenge.Here, we proposed Priority Neuron Network (PNN), a resource-aware neural networks model that can be reconfigured into smaller sub-networks at runtime. This approach enables a trade-off between the model's computation time and accuracy based on available resources. The PNN model is memory efficient since it stores only one set of parameters to account for various sub-network sizes. We propose a training algorithm that applies regularization techniques to constrain the activation value of neurons and assigns a priority to each one. We consider the neuron's ordinal number as our priority criteria in that the priority of the neuron is inversely proportional to its ordinal number in the layer. This imposes a relatively sorted order on the activation values. We conduct experiments to employ our PNN as the predictive model in a CPS application. We can see that not only our technique will resolve the memory overhead of DNN architectures but it also reduces the computation overhead for the training process substantially. The training time is a critical matter especially in embedded systems where many NN models are trained on the fly