5,117 research outputs found
Phonetics Learning Anxiety โ Results of a Preliminary Study
The Phonetics Learning Anxiety Scale, a 44-item questionnaire based on a 6-point Likert scale, designed for the purpose of the research sheds light on the nature of this peculiar type of apprehension experienced by advanced FL learners in a specific educational context (i.e. a traditional classroom, rather than a language or computer laboratory), in which the major focus is on pronunciation practice. The obtained quantitative data imply that such factors as fear of negative evaluation (represented by general oral performance apprehension and concern over pronunciation mistakes, pronunciation self-image, pronunciation self-efficacy and self-assessment) and beliefs about the nature of FL pronunciation learning are significant sources of PhLA. Anxiety about the transcription test (IPA Test Anxiety) - one of the other hypothetical determinants of PhLA - did not prove to be correlated with the general level of Phonetics Learning Anxiet
Automatic assessment of spoken language proficiency of non-native children
This paper describes technology developed to automatically grade Italian
students (ages 9-16) on their English and German spoken language proficiency.
The students' spoken answers are first transcribed by an automatic speech
recognition (ASR) system and then scored using a feedforward neural network
(NN) that processes features extracted from the automatic transcriptions.
In-domain acoustic models, employing deep neural networks (DNNs), are derived
by adapting the parameters of an original out of domain DNN
A computational model for studying L1โs effect on L2 speech learning
abstract: Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. To test the hypotheses, this study comes up with a computational model to analyze the accented speech properties in both segmental (short-term speech measurements on short-segment or phoneme level) and suprasegmental (long-term speech measurements on word, long-segment, or sentence level) feature space. The benefit of using a computational model is that it enables quantitative analysis of L1's effect on accent in terms of different phonological properties. The core parts of this computational model are feature extraction schemes to extract pronunciation and prosody representation of accented speech based on existing techniques in speech processing field. Correlation analysis on both segmental and suprasegmental feature space is conducted to look into the relationship between acoustic measurements related to L1s and perceived accentedness across several L1s. Multiple regression analysis is employed to investigate how the L1's effect impacts the perception of foreign accent, and how accented speech produced by speakers from different L1s behaves distinctly on segmental and suprasegmental feature spaces. Results unveil the potential application of the methodology in this study to provide quantitative analysis of accented speech, and extend current studies in L2 speech learning theory to large scale. Practically, this study further shows that the computational model proposed in this study can benefit automatic accentedness evaluation system by adding features related to speakers' L1s.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201
Improving Phonemic Awareness in ESL Pronunciation Using Shadowing During Tutorials: Implications for ESL Teachers
Although there are numerous reasons to improve pronunciation instruction, the teaching of phonologic structures in English has become less popular among k-12 classrooms. This study proposes that the use of a relatively new technique may positively improve ESL students\u27 pronunciation of American Standard English. This technique is known as shadowing. The data obtained was analyzed and evaluated in terms of phonological structures. The motivation to do this particular study came from previous research concerning word boundaries and phonological structures of consonants, in addition to my previous experience as an ESL tutor and instructor at SCSU. Students were making too many phonemic errors. This study will provide evidence for specific effects on phonemic awareness and also in regards to fluency and accuracy. To accomplish this, a shadowing methodology was used. The participants performed three types of audio-recorded speech samples both before and after their weekly tutorial sessions. Each would serve as a pre-test/post-test. First, spontaneous speech samples were used. Second, rehearsed speech samples were used. Third, read aloud activities were conducted to produce recorded speech samples. The recordings of speech samples were provided by four native speakers of English, two Caucasian males and two Caucasian females. This generated the authentic speech samples necessary for data analysis. The activities stemmed from a modified activity from the St. Cloud State ESL Department\u27s Tutorial packet. The samples were assessed by native speakers of English (speech sample raters) who listened to samples and scored each one based on a speech rubric provided by the researcher. The results of the data collected (scores from raters) were calculated and presented in the form of paired TTests. Common problems associated with pronunciation and whether the use of shadowing leads to an increased level of phonemic awareness were the target objectives for the elicited data. The students were divided into two groups. Student Group, A used a written transcript while making the shadowing attempts and Student Group B did not. The results indicated that most of the comparisons did not yield statistically significant results (gender and language yielded no significance). However, even though two of the mean scores for groups A and B ( comparing pre and post-test) yielded a difference, none of them were statistically significant as neither were equal or greater than the Alpha value of 0.05
The development of automatic speech evaluation system for learners of English
ๅถๅบฆ:ๆฐ ; ๅ ฑๅ็ชๅท:็ฒ3183ๅท ; ๅญฆไฝใฎ็จฎ้ก:ๅๅฃซ(ๆ่ฒๅญฆ) ; ๆไธๅนดๆๆฅ:2010/11/30 ; ๆฉๅคงๅญฆไฝ่จ็ชๅท:ๆฐ547
CAPT๋ฅผ ์ํ ๋ฐ์ ๋ณ์ด ๋ถ์ ๋ฐ CycleGAN ๊ธฐ๋ฐ ํผ๋๋ฐฑ ์์ฑ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :์ธ๋ฌธ๋ํ ํ๋๊ณผ์ ์ธ์ง๊ณผํ์ ๊ณต,2020. 2. ์ ๋ฏผํ.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies.
This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system.
The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.์ธ๊ตญ์ด๋ก์์ ํ๊ตญ์ด ๊ต์ก์ ๋ํ ๊ด์ฌ์ด ๊ณ ์กฐ๋์ด ํ๊ตญ์ด ํ์ต์์ ์๊ฐ ํฌ๊ฒ ์ฆ๊ฐํ๊ณ ์์ผ๋ฉฐ, ์์ฑ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ์ ์ฉํ ์ปดํจํฐ ๊ธฐ๋ฐ ๋ฐ์ ๊ต์ก(Computer-Assisted Pronunciation Training; CAPT) ์ดํ๋ฆฌ์ผ์ด์
์ ๋ํ ์ฐ๊ตฌ ๋ํ ์ ๊ทน์ ์ผ๋ก ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ๊ทธ๋ผ์๋ ๋ถ๊ตฌํ๊ณ ํ์กดํ๋ ํ๊ตญ์ด ๋งํ๊ธฐ ๊ต์ก ์์คํ
์ ์ธ๊ตญ์ธ์ ํ๊ตญ์ด์ ๋ํ ์ธ์ดํ์ ํน์ง์ ์ถฉ๋ถํ ํ์ฉํ์ง ์๊ณ ์์ผ๋ฉฐ, ์ต์ ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ๋ํ ์ ์ฉ๋์ง ์๊ณ ์๋ ์ค์ ์ด๋ค. ๊ฐ๋ฅํ ์์ธ์ผ๋ก์จ๋ ์ธ๊ตญ์ธ ๋ฐํ ํ๊ตญ์ด ํ์์ ๋ํ ๋ถ์์ด ์ถฉ๋ถํ๊ฒ ์ด๋ฃจ์ด์ง์ง ์์๋ค๋ ์ , ๊ทธ๋ฆฌ๊ณ ๊ด๋ จ ์ฐ๊ตฌ๊ฐ ์์ด๋ ์ด๋ฅผ ์๋ํ๋ ์์คํ
์ ๋ฐ์ํ๊ธฐ์๋ ๊ณ ๋ํ๋ ์ฐ๊ตฌ๊ฐ ํ์ํ๋ค๋ ์ ์ด ์๋ค. ๋ฟ๋ง ์๋๋ผ CAPT ๊ธฐ์ ์ ๋ฐ์ ์ผ๋ก๋ ์ ํธ์ฒ๋ฆฌ, ์ด์จ ๋ถ์, ์์ฐ์ด์ฒ๋ฆฌ ๊ธฐ๋ฒ๊ณผ ๊ฐ์ ํน์ง ์ถ์ถ์ ์์กดํ๊ณ ์์ด์ ์ ํฉํ ํน์ง์ ์ฐพ๊ณ ์ด๋ฅผ ์ ํํ๊ฒ ์ถ์ถํ๋ ๋ฐ์ ๋ง์ ์๊ฐ๊ณผ ๋
ธ๋ ฅ์ด ํ์ํ ์ค์ ์ด๋ค. ์ด๋ ์ต์ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ํ์ฉํจ์ผ๋ก์จ ์ด ๊ณผ์ ๋ํ ๋ฐ์ ์ ์ฌ์ง๊ฐ ๋ง๋ค๋ ๋ฐ๋ฅผ ์์ฌํ๋ค.
๋ฐ๋ผ์ ๋ณธ ์ฐ๊ตฌ๋ ๋จผ์ CAPT ์์คํ
๊ฐ๋ฐ์ ์์ด ๋ฐ์ ๋ณ์ด ์์๊ณผ ์ธ์ดํ์ ์๊ด๊ด๊ณ๋ฅผ ๋ถ์ํ์๋ค. ์ธ๊ตญ์ธ ํ์๋ค์ ๋ญ๋
์ฒด ๋ณ์ด ์์๊ณผ ํ๊ตญ์ด ์์ด๋ฏผ ํ์๋ค์ ๋ญ๋
์ฒด ๋ณ์ด ์์์ ๋์กฐํ๊ณ ์ฃผ์ํ ๋ณ์ด๋ฅผ ํ์ธํ ํ, ์๊ด๊ด๊ณ ๋ถ์์ ํตํ์ฌ ์์ฌ์ํต์ ์ํฅ์ ๋ฏธ์น๋ ์ค์๋๋ฅผ ํ์
ํ์๋ค. ๊ทธ ๊ฒฐ๊ณผ, ์ข
์ฑ ์ญ์ ์ 3์ค ๋๋ฆฝ์ ํผ๋, ์ด๋ถ์ ๊ด๋ จ ์ค๋ฅ๊ฐ ๋ฐ์ํ ๊ฒฝ์ฐ ํผ๋๋ฐฑ ์์ฑ์ ์ฐ์ ์ ์ผ๋ก ๋ฐ์ํ๋ ๊ฒ์ด ํ์ํ๋ค๋ ๊ฒ์ด ํ์ธ๋์๋ค.
๊ต์ ๋ ํผ๋๋ฐฑ์ ์๋์ผ๋ก ์์ฑํ๋ ๊ฒ์ CAPT ์์คํ
์ ์ค์ํ ๊ณผ์ ์ค ํ๋์ด๋ค. ๋ณธ ์ฐ๊ตฌ๋ ์ด ๊ณผ์ ๊ฐ ๋ฐํ์ ์คํ์ผ ๋ณํ์ ๋ฌธ์ ๋ก ํด์์ด ๊ฐ๋ฅํ๋ค๊ณ ๋ณด์์ผ๋ฉฐ, ์์ฑ์ ์ ๋ ์ ๊ฒฝ๋ง (Cycle-consistent Generative Adversarial Network; CycleGAN) ๊ตฌ์กฐ์์ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ ์ ์ํ์๋ค. GAN ๋คํธ์ํฌ์ ์์ฑ๋ชจ๋ธ์ ๋น์์ด๋ฏผ ๋ฐํ์ ๋ถํฌ์ ์์ด๋ฏผ ๋ฐํ ๋ถํฌ์ ๋งคํ์ ํ์ตํ๋ฉฐ, Cycle consistency ์์คํจ์๋ฅผ ์ฌ์ฉํจ์ผ๋ก์จ ๋ฐํ๊ฐ ์ ๋ฐ์ ์ธ ๊ตฌ์กฐ๋ฅผ ์ ์งํจ๊ณผ ๋์์ ๊ณผ๋ํ ๊ต์ ์ ๋ฐฉ์งํ์๋ค. ๋ณ๋์ ํน์ง ์ถ์ถ ๊ณผ์ ์ด ์์ด ํ์ํ ํน์ง๋ค์ด CycleGAN ํ๋ ์์ํฌ์์ ๋ฌด๊ฐ๋
๋ฐฉ๋ฒ์ผ๋ก ์ค์ค๋ก ํ์ต๋๋ ๋ฐฉ๋ฒ์ผ๋ก, ์ธ์ด ํ์ฅ์ด ์ฉ์ดํ ๋ฐฉ๋ฒ์ด๋ค.
์ธ์ดํ์ ๋ถ์์์ ๋๋ฌ๋ ์ฃผ์ํ ๋ณ์ด๋ค ๊ฐ์ ์ฐ์ ์์๋ Auxiliary Classifier CycleGAN ๊ตฌ์กฐ์์ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ ์ ์ํ์๋ค. ์ด ๋ฐฉ๋ฒ์ ๊ธฐ์กด์ CycleGAN์ ์ง์์ ์ ๋ชฉ์์ผ ํผ๋๋ฐฑ ์์ฑ์ ์์ฑํจ๊ณผ ๋์์ ํด๋น ํผ๋๋ฐฑ์ด ์ด๋ค ์ ํ์ ์ค๋ฅ์ธ์ง ๋ถ๋ฅํ๋ ๋ฌธ์ ๋ฅผ ์ํํ๋ค. ์ด๋ ๋๋ฉ์ธ ์ง์์ด ๊ต์ ํผ๋๋ฐฑ ์์ฑ ๋จ๊ณ๊น์ง ์ ์ง๋๊ณ ํต์ ๊ฐ ๊ฐ๋ฅํ๋ค๋ ์ฅ์ ์ด ์๋ค๋ ๋ฐ์ ๊ทธ ์์๊ฐ ์๋ค.
๋ณธ ์ฐ๊ตฌ์์ ์ ์ํ ๋ฐฉ๋ฒ์ ํ๊ฐํ๊ธฐ ์ํด์ 27๊ฐ์ ๋ชจ๊ตญ์ด๋ฅผ ๊ฐ๋ 217๋ช
์ ์ ์๋ฏธ ์ดํ ๋ฐํ 65,100๊ฐ๋ก ํผ๋๋ฐฑ ์๋ ์์ฑ ๋ชจ๋ธ์ ํ๋ จํ๊ณ , ๊ฐ์ ์ฌ๋ถ ๋ฐ ์ ๋์ ๋ํ ์ง๊ฐ ํ๊ฐ๋ฅผ ์ํํ์๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ์์ ๋ ํ์ต์ ๋ณธ์ธ์ ๋ชฉ์๋ฆฌ๋ฅผ ์ ์งํ ์ฑ ๊ต์ ๋ ๋ฐ์์ผ๋ก ๋ณํํ๋ ๊ฒ์ด ๊ฐ๋ฅํ๋ฉฐ, ์ ํต์ ์ธ ๋ฐฉ๋ฒ์ธ ์๋์ด ๋๊ธฐ์ ์ค์ฒฉ๊ฐ์ฐ (Pitch-Synchronous Overlap-and-Add) ์๊ณ ๋ฆฌ์ฆ์ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋นํด ์๋ ๊ฐ์ ๋ฅ 16.67%์ด ํ์ธ๋์๋ค.Chapter 1. Introduction 1
1.1. Motivation 1
1.1.1. An Overview of CAPT Systems 3
1.1.2. Survey of existing Korean CAPT Systems 5
1.2. Problem Statement 7
1.3. Thesis Structure 7
Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9
2.1. Comparison between Korean and Chinese 11
2.1.1. Phonetic and Syllable Structure Comparisons 11
2.1.2. Phonological Comparisons 14
2.2. Related Works 16
2.3. Proposed Analysis Method 19
2.3.1. Corpus 19
2.3.2. Transcribers and Agreement Rates 22
2.4. Salient Pronunciation Variations 22
2.4.1. Segmental Variation Patterns 22
2.4.1.1. Discussions 25
2.4.2. Phonological Variation Patterns 26
2.4.1.2. Discussions 27
2.5. Summary 29
Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30
3.1. Related Works 31
3.1.1. Criteria used in L2 Speech 31
3.1.2. Criteria used in L2 Korean Speech 32
3.2. Proposed Human Evaluation Method 36
3.2.1. Reading Prompt Design 36
3.2.2. Evaluation Criteria Design 37
3.2.3. Raters and Agreement Rates 40
3.3. Linguistic Factors Affecting L2 Korean Accentedness 41
3.3.1. Pearsons Correlation Analysis 41
3.3.2. Discussions 42
3.3.3. Implications for Automatic Feedback Generation 44
3.4. Summary 45
Chapter 4. Corrective Feedback Generation for CAPT 46
4.1. Related Works 46
4.1.1. Prosody Transplantation 47
4.1.2. Recent Speech Conversion Methods 49
4.1.3. Evaluation of Corrective Feedback 50
4.2. Proposed Method: Corrective Feedback as a Style Transfer 51
4.2.1. Speech Analysis at Spectral Domain 53
4.2.2. Self-imitative Learning 55
4.2.3. An Analogy: CAPT System and GAN Architecture 57
4.3. Generative Adversarial Networks 59
4.3.1. Conditional GAN 61
4.3.2. CycleGAN 62
4.4. Experiment 63
4.4.1. Corpus 64
4.4.2. Baseline Implementation 65
4.4.3. Adversarial Training Implementation 65
4.4.4. Spectrogram-to-Spectrogram Training 66
4.5. Results and Evaluation 69
4.5.1. Spectrogram Generation Results 69
4.5.2. Perceptual Evaluation 70
4.5.3. Discussions 72
4.6. Summary 74
Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75
5.1. Linguistic Class Selection 75
5.2. Auxiliary Classifier CycleGAN Design 77
5.3. Experiment and Results 80
5.3.1. Corpus 80
5.3.2. Feature Annotations 81
5.3.3. Experiment Setup 81
5.3.4. Results 82
5.4. Summary 84
Chapter 6. Conclusion 86
6.1. Thesis Results 86
6.2. Thesis Contributions 88
6.3. Recommendations for Future Work 89
Bibliography 91
Appendix 107
Abstract in Korean 117
Acknowledgments 120Docto
Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Recent studies on pronunciation scoring have explored the effect of
introducing phone embeddings as reference pronunciation, but mostly in an
implicit manner, i.e., addition or concatenation of reference phone embedding
and actual pronunciation of the target phone as the phone-level pronunciation
quality representation. In this paper, we propose to use linguistic-acoustic
similarity to explicitly measure the deviation of non-native production from
its native reference for pronunciation assessment. Specifically, the deviation
is first estimated by the cosine similarity between reference phone embedding
and corresponding acoustic embedding. Next, a phone-level Goodness of
pronunciation (GOP) pre-training stage is introduced to guide this
similarity-based learning for better initialization of the aforementioned two
embeddings. Finally, a transformer-based hierarchical pronunciation scorer is
used to map a sequence of phone embeddings, acoustic embeddings along with
their similarity measures to predict the final utterance-level score.
Experimental results on the non-native databases suggest that the proposed
system significantly outperforms the baselines, where the acoustic and phone
embeddings are simply added or concatenated. A further examination shows that
the phone embeddings learned in the proposed approach are able to capture
linguistic-acoustic attributes of native pronunciation as reference.Comment: Accepted by ICASSP 202
- โฆ