3,601 research outputs found
Unstressed Vowels in German Learner English: An Instrumental Study
This study investigates the production of vowels in unstressed syllables by advanced German learners of English in comparison with native speakers of Standard Southern British English. Two acoustic properties were measured: duration and formant structure. The results indicate that duration of unstressed vowels is similar in the two groups, though there is some variation depending on the phonetic context. In terms of formant structure, learners produce slightly higher F1 and considerably lower F2, the difference in F2 being statistically significant for each learner. Formant values varied as a function of context and orthographic representation of the vowel
Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate
Second-language (L2) speech is consistently slower than first-language (L1) speech, and L1 speaking rate varies within- and across-talkers depending on many individual, situational, linguistic, and sociolinguistic factors. It is asked whether speaking rate is also determined by a language-independent talker-specific trait such that, across a group of bilinguals, L1 speaking rate significantly predicts L2 speaking rate. Two measurements of speaking rate were automatically extracted from recordings of read and spontaneous speech by English monolinguals (nโ=โ27) and bilinguals from ten L1 backgrounds (nโ=โ86): speech rate (syllables/second), and articulation rate (syllables/second excluding silent pauses). Replicating prior work, L2 speaking rates were significantly slower than L1 speaking rates both across-groups (monolinguals' L1 English vs bilinguals' L2 English), and across L1 and L2 within bilinguals. Critically, within the bilingual group, L1 speaking rate significantly predicted L2 speaking rate, suggesting that a significant portion of inter-talker variation in L2 speech is derived from inter-talker variation in L1 speech, and that individual variability in L2 spoken language production may be best understood within the context of individual variability in L1 spoken language production
CAPT๋ฅผ ์ํ ๋ฐ์ ๋ณ์ด ๋ถ์ ๋ฐ CycleGAN ๊ธฐ๋ฐ ํผ๋๋ฐฑ ์์ฑ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :์ธ๋ฌธ๋ํ ํ๋๊ณผ์ ์ธ์ง๊ณผํ์ ๊ณต,2020. 2. ์ ๋ฏผํ.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies.
This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system.
The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.์ธ๊ตญ์ด๋ก์์ ํ๊ตญ์ด ๊ต์ก์ ๋ํ ๊ด์ฌ์ด ๊ณ ์กฐ๋์ด ํ๊ตญ์ด ํ์ต์์ ์๊ฐ ํฌ๊ฒ ์ฆ๊ฐํ๊ณ ์์ผ๋ฉฐ, ์์ฑ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ์ ์ฉํ ์ปดํจํฐ ๊ธฐ๋ฐ ๋ฐ์ ๊ต์ก(Computer-Assisted Pronunciation Training; CAPT) ์ดํ๋ฆฌ์ผ์ด์
์ ๋ํ ์ฐ๊ตฌ ๋ํ ์ ๊ทน์ ์ผ๋ก ์ด๋ฃจ์ด์ง๊ณ ์๋ค. ๊ทธ๋ผ์๋ ๋ถ๊ตฌํ๊ณ ํ์กดํ๋ ํ๊ตญ์ด ๋งํ๊ธฐ ๊ต์ก ์์คํ
์ ์ธ๊ตญ์ธ์ ํ๊ตญ์ด์ ๋ํ ์ธ์ดํ์ ํน์ง์ ์ถฉ๋ถํ ํ์ฉํ์ง ์๊ณ ์์ผ๋ฉฐ, ์ต์ ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ๋ํ ์ ์ฉ๋์ง ์๊ณ ์๋ ์ค์ ์ด๋ค. ๊ฐ๋ฅํ ์์ธ์ผ๋ก์จ๋ ์ธ๊ตญ์ธ ๋ฐํ ํ๊ตญ์ด ํ์์ ๋ํ ๋ถ์์ด ์ถฉ๋ถํ๊ฒ ์ด๋ฃจ์ด์ง์ง ์์๋ค๋ ์ , ๊ทธ๋ฆฌ๊ณ ๊ด๋ จ ์ฐ๊ตฌ๊ฐ ์์ด๋ ์ด๋ฅผ ์๋ํ๋ ์์คํ
์ ๋ฐ์ํ๊ธฐ์๋ ๊ณ ๋ํ๋ ์ฐ๊ตฌ๊ฐ ํ์ํ๋ค๋ ์ ์ด ์๋ค. ๋ฟ๋ง ์๋๋ผ CAPT ๊ธฐ์ ์ ๋ฐ์ ์ผ๋ก๋ ์ ํธ์ฒ๋ฆฌ, ์ด์จ ๋ถ์, ์์ฐ์ด์ฒ๋ฆฌ ๊ธฐ๋ฒ๊ณผ ๊ฐ์ ํน์ง ์ถ์ถ์ ์์กดํ๊ณ ์์ด์ ์ ํฉํ ํน์ง์ ์ฐพ๊ณ ์ด๋ฅผ ์ ํํ๊ฒ ์ถ์ถํ๋ ๋ฐ์ ๋ง์ ์๊ฐ๊ณผ ๋
ธ๋ ฅ์ด ํ์ํ ์ค์ ์ด๋ค. ์ด๋ ์ต์ ๋ฅ๋ฌ๋ ๊ธฐ๋ฐ ์ธ์ด์ฒ๋ฆฌ ๊ธฐ์ ์ ํ์ฉํจ์ผ๋ก์จ ์ด ๊ณผ์ ๋ํ ๋ฐ์ ์ ์ฌ์ง๊ฐ ๋ง๋ค๋ ๋ฐ๋ฅผ ์์ฌํ๋ค.
๋ฐ๋ผ์ ๋ณธ ์ฐ๊ตฌ๋ ๋จผ์ CAPT ์์คํ
๊ฐ๋ฐ์ ์์ด ๋ฐ์ ๋ณ์ด ์์๊ณผ ์ธ์ดํ์ ์๊ด๊ด๊ณ๋ฅผ ๋ถ์ํ์๋ค. ์ธ๊ตญ์ธ ํ์๋ค์ ๋ญ๋
์ฒด ๋ณ์ด ์์๊ณผ ํ๊ตญ์ด ์์ด๋ฏผ ํ์๋ค์ ๋ญ๋
์ฒด ๋ณ์ด ์์์ ๋์กฐํ๊ณ ์ฃผ์ํ ๋ณ์ด๋ฅผ ํ์ธํ ํ, ์๊ด๊ด๊ณ ๋ถ์์ ํตํ์ฌ ์์ฌ์ํต์ ์ํฅ์ ๋ฏธ์น๋ ์ค์๋๋ฅผ ํ์
ํ์๋ค. ๊ทธ ๊ฒฐ๊ณผ, ์ข
์ฑ ์ญ์ ์ 3์ค ๋๋ฆฝ์ ํผ๋, ์ด๋ถ์ ๊ด๋ จ ์ค๋ฅ๊ฐ ๋ฐ์ํ ๊ฒฝ์ฐ ํผ๋๋ฐฑ ์์ฑ์ ์ฐ์ ์ ์ผ๋ก ๋ฐ์ํ๋ ๊ฒ์ด ํ์ํ๋ค๋ ๊ฒ์ด ํ์ธ๋์๋ค.
๊ต์ ๋ ํผ๋๋ฐฑ์ ์๋์ผ๋ก ์์ฑํ๋ ๊ฒ์ CAPT ์์คํ
์ ์ค์ํ ๊ณผ์ ์ค ํ๋์ด๋ค. ๋ณธ ์ฐ๊ตฌ๋ ์ด ๊ณผ์ ๊ฐ ๋ฐํ์ ์คํ์ผ ๋ณํ์ ๋ฌธ์ ๋ก ํด์์ด ๊ฐ๋ฅํ๋ค๊ณ ๋ณด์์ผ๋ฉฐ, ์์ฑ์ ์ ๋ ์ ๊ฒฝ๋ง (Cycle-consistent Generative Adversarial Network; CycleGAN) ๊ตฌ์กฐ์์ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ ์ ์ํ์๋ค. GAN ๋คํธ์ํฌ์ ์์ฑ๋ชจ๋ธ์ ๋น์์ด๋ฏผ ๋ฐํ์ ๋ถํฌ์ ์์ด๋ฏผ ๋ฐํ ๋ถํฌ์ ๋งคํ์ ํ์ตํ๋ฉฐ, Cycle consistency ์์คํจ์๋ฅผ ์ฌ์ฉํจ์ผ๋ก์จ ๋ฐํ๊ฐ ์ ๋ฐ์ ์ธ ๊ตฌ์กฐ๋ฅผ ์ ์งํจ๊ณผ ๋์์ ๊ณผ๋ํ ๊ต์ ์ ๋ฐฉ์งํ์๋ค. ๋ณ๋์ ํน์ง ์ถ์ถ ๊ณผ์ ์ด ์์ด ํ์ํ ํน์ง๋ค์ด CycleGAN ํ๋ ์์ํฌ์์ ๋ฌด๊ฐ๋
๋ฐฉ๋ฒ์ผ๋ก ์ค์ค๋ก ํ์ต๋๋ ๋ฐฉ๋ฒ์ผ๋ก, ์ธ์ด ํ์ฅ์ด ์ฉ์ดํ ๋ฐฉ๋ฒ์ด๋ค.
์ธ์ดํ์ ๋ถ์์์ ๋๋ฌ๋ ์ฃผ์ํ ๋ณ์ด๋ค ๊ฐ์ ์ฐ์ ์์๋ Auxiliary Classifier CycleGAN ๊ตฌ์กฐ์์ ๋ชจ๋ธ๋งํ๋ ๊ฒ์ ์ ์ํ์๋ค. ์ด ๋ฐฉ๋ฒ์ ๊ธฐ์กด์ CycleGAN์ ์ง์์ ์ ๋ชฉ์์ผ ํผ๋๋ฐฑ ์์ฑ์ ์์ฑํจ๊ณผ ๋์์ ํด๋น ํผ๋๋ฐฑ์ด ์ด๋ค ์ ํ์ ์ค๋ฅ์ธ์ง ๋ถ๋ฅํ๋ ๋ฌธ์ ๋ฅผ ์ํํ๋ค. ์ด๋ ๋๋ฉ์ธ ์ง์์ด ๊ต์ ํผ๋๋ฐฑ ์์ฑ ๋จ๊ณ๊น์ง ์ ์ง๋๊ณ ํต์ ๊ฐ ๊ฐ๋ฅํ๋ค๋ ์ฅ์ ์ด ์๋ค๋ ๋ฐ์ ๊ทธ ์์๊ฐ ์๋ค.
๋ณธ ์ฐ๊ตฌ์์ ์ ์ํ ๋ฐฉ๋ฒ์ ํ๊ฐํ๊ธฐ ์ํด์ 27๊ฐ์ ๋ชจ๊ตญ์ด๋ฅผ ๊ฐ๋ 217๋ช
์ ์ ์๋ฏธ ์ดํ ๋ฐํ 65,100๊ฐ๋ก ํผ๋๋ฐฑ ์๋ ์์ฑ ๋ชจ๋ธ์ ํ๋ จํ๊ณ , ๊ฐ์ ์ฌ๋ถ ๋ฐ ์ ๋์ ๋ํ ์ง๊ฐ ํ๊ฐ๋ฅผ ์ํํ์๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ์ฌ์ฉํ์์ ๋ ํ์ต์ ๋ณธ์ธ์ ๋ชฉ์๋ฆฌ๋ฅผ ์ ์งํ ์ฑ ๊ต์ ๋ ๋ฐ์์ผ๋ก ๋ณํํ๋ ๊ฒ์ด ๊ฐ๋ฅํ๋ฉฐ, ์ ํต์ ์ธ ๋ฐฉ๋ฒ์ธ ์๋์ด ๋๊ธฐ์ ์ค์ฒฉ๊ฐ์ฐ (Pitch-Synchronous Overlap-and-Add) ์๊ณ ๋ฆฌ์ฆ์ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋นํด ์๋ ๊ฐ์ ๋ฅ 16.67%์ด ํ์ธ๋์๋ค.Chapter 1. Introduction 1
1.1. Motivation 1
1.1.1. An Overview of CAPT Systems 3
1.1.2. Survey of existing Korean CAPT Systems 5
1.2. Problem Statement 7
1.3. Thesis Structure 7
Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9
2.1. Comparison between Korean and Chinese 11
2.1.1. Phonetic and Syllable Structure Comparisons 11
2.1.2. Phonological Comparisons 14
2.2. Related Works 16
2.3. Proposed Analysis Method 19
2.3.1. Corpus 19
2.3.2. Transcribers and Agreement Rates 22
2.4. Salient Pronunciation Variations 22
2.4.1. Segmental Variation Patterns 22
2.4.1.1. Discussions 25
2.4.2. Phonological Variation Patterns 26
2.4.1.2. Discussions 27
2.5. Summary 29
Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30
3.1. Related Works 31
3.1.1. Criteria used in L2 Speech 31
3.1.2. Criteria used in L2 Korean Speech 32
3.2. Proposed Human Evaluation Method 36
3.2.1. Reading Prompt Design 36
3.2.2. Evaluation Criteria Design 37
3.2.3. Raters and Agreement Rates 40
3.3. Linguistic Factors Affecting L2 Korean Accentedness 41
3.3.1. Pearsons Correlation Analysis 41
3.3.2. Discussions 42
3.3.3. Implications for Automatic Feedback Generation 44
3.4. Summary 45
Chapter 4. Corrective Feedback Generation for CAPT 46
4.1. Related Works 46
4.1.1. Prosody Transplantation 47
4.1.2. Recent Speech Conversion Methods 49
4.1.3. Evaluation of Corrective Feedback 50
4.2. Proposed Method: Corrective Feedback as a Style Transfer 51
4.2.1. Speech Analysis at Spectral Domain 53
4.2.2. Self-imitative Learning 55
4.2.3. An Analogy: CAPT System and GAN Architecture 57
4.3. Generative Adversarial Networks 59
4.3.1. Conditional GAN 61
4.3.2. CycleGAN 62
4.4. Experiment 63
4.4.1. Corpus 64
4.4.2. Baseline Implementation 65
4.4.3. Adversarial Training Implementation 65
4.4.4. Spectrogram-to-Spectrogram Training 66
4.5. Results and Evaluation 69
4.5.1. Spectrogram Generation Results 69
4.5.2. Perceptual Evaluation 70
4.5.3. Discussions 72
4.6. Summary 74
Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75
5.1. Linguistic Class Selection 75
5.2. Auxiliary Classifier CycleGAN Design 77
5.3. Experiment and Results 80
5.3.1. Corpus 80
5.3.2. Feature Annotations 81
5.3.3. Experiment Setup 81
5.3.4. Results 82
5.4. Summary 84
Chapter 6. Conclusion 86
6.1. Thesis Results 86
6.2. Thesis Contributions 88
6.3. Recommendations for Future Work 89
Bibliography 91
Appendix 107
Abstract in Korean 117
Acknowledgments 120Docto
Recommended from our members
An exploratory study of foreign accent and phonological awareness in Korean learners of English
Communication in a second or multiple languages has become essential in the globalized world. However, acquiring a second language (L2) after a critical period is universally acknowledged to be challenging (Lenneberg, 1967). Late learners hardly reach a nativelike level in L2, particularly in its pronunciation, and their incomplete phonological acquisition is manifested by a foreign accentโa common and persistent feature of otherwise fluent L2 speech. Although foreign-accented speech is widespread, it has been a target of social constraints in L2-speaking communities, causing many learners and instructors to seek out ways to reduce foreign accents. Accordingly, research in L2 speech has unceasingly examined various learner-external and learner-internal factors of the occurrence of foreign accents as well as nonnative speech characteristics underlying the judgment of the degree of foreign accents. The current study aimed to expand the understanding of the characteristics and judgments of foreign accents by investigating phonological awareness, a construct pertinent to learnersโ phonological knowledge, which has received little attention in research on foreign accents.
The current study was exploratory and non-experimental research that targeted 40 adults with Korean-accented English living in the United States. The study first examined how 23 raters speaking American English as their native language detect, perceive, describe, and rate Korean-accented English. Through qualitative and quantitative analyses of the accent perception data, the study identified various phonological and phonetic deviations from the nativelike sounds, which largely result from the influence of first language (Korean) on L2 (English). The study then probed the relationship between foreign accents and learnersโ awareness of the phonological system of L2, which was measured using production, perception, and verbalization tasks that tapped into the knowledge of L2 phonology. The study found a significant inverse relationship between the degree of a foreign accent and phonological awareness, particularly implicit knowledge of L2 segmentals. Further in-depth analyses revealed that explicit knowledge of L2 phonology alone was not sufficient for targetlike pronunciation. Findings suggest that L2 speakers experience varying degrees of difficulty in perceiving and producing different L2 segmentals, possibly resulting in foreign-accented speech
Measuring fluency: Temporal variables and pausing patterns in L2 English speech
This paper examines temporal variables and pausing patterns in L2 English speech to investigate fluency as a measurable component of oral proficiency. Fluency can be defined as โspeed and smoothness of oral deliveryโ. We can measure the speed of oral delivery through calculating temporal variables such as speech rate and mean syllables per run where โrunโ is the vocal chunk between silent pauses. The smoothness of oral delivery can be measured through examination of pausing patterns by classifying the placement of pauses. Pauses may be placed in expected positions such as clause/phrase boundaries or in unexpected positions. Pause placement in unexpected positions may reduce the smoothness of oral delivery. The data sets are speech samples from the Oral English Proficiency Test (OEPT) but include the responses from two items (RAL: read aloud; NP: news passage). A total of 325 speakers across four different language groups (native speakers of Korean, Chinese, Hindi, and English) are represented across 6 proficiency levels (rated by holistic scoring based on the OEPT scale from 35 to 60). The speech samples were transcribed manually using a computer-assisted annotation tool that allowed capture of information about syllables, pausing boundaries, and types of pausing positions. Development of the annotation tool became a central concern of this study as establishing reliable and efficient methods in fluency research. Speech rate, mean syllables per run, and number of pauses per second were selected to examine temporal variables; number of unexpected pauses per second and expected pausing ratio were selected to compare pausing patterns across proficiency levels and language backgrounds. The results show that there are some linear relationships in temporal and pausing variables. High proficiency level speakers spoke at higher rates with expected pausing patterns compared to low proficiency level speakers who spoke at slower rates with almost no identifiable pausing patterns
The development of automatic speech evaluation system for learners of English
ๅถๅบฆ:ๆฐ ; ๅ ฑๅ็ชๅท:็ฒ3183ๅท ; ๅญฆไฝใฎ็จฎ้ก:ๅๅฃซ(ๆ่ฒๅญฆ) ; ๆไธๅนดๆๆฅ:2010/11/30 ; ๆฉๅคงๅญฆไฝ่จ็ชๅท:ๆฐ547
Recommended from our members
Heritage Languages: In the 'Wild' and in the Classroom
Heritage speakers are people raised in a home where one language is spoken who subsequently switch to another dominant language. The version of the home language that they have not completely acquired โ heritage language โ has only recently been given the attention it deserves from linguists and language instructors. Despite the appearance of great variation among heritage speakers, they fall along a continuum based upon the speakers' distance from the baseline language. Such a continuum-based model enables researchers and instructors to classify heritage speakers more accurately and readily. This article discusses the results of research on lower-proficiency speakers, identifying recurrent features of heritage languages in phonology, morphology, and syntax. Preliminary results indicate that different heritage languages share a number of structural similarities; this finding is important for the understanding of general processes involved in language acquisition. The article also presents implications of the main findings for language education and identifies areas needing further study.Linguistic
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
Automatic Pronunciation Assessment -- A Review
Pronunciation assessment and its application in computer-aided pronunciation
training (CAPT) have seen impressive progress in recent years. With the rapid
growth in language processing and deep learning over the past few years, there
is a need for an updated review. In this paper, we review methods employed in
pronunciation assessment for both phonemic and prosodic. We categorize the main
challenges observed in prominent research trends, and highlight existing
limitations, and available resources. This is followed by a discussion of the
remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding
Text reconstruction activities and teaching language forms
Even though there is a broad consensus that teaching language forms is facilitative or even necessary in some contexts, there are still disagreements concerning, among other things, how formal aspects of the target language should be taught. One important area of controversy is whether pedagogic intervention should be input-oriented, emphasizing comprehension of the form- meaning mappings represented by specific linguistic features or output-based, requiring learners to produce these features accurately in gradually more communicative activities. The present paper focuses on the latter of these two options and, basing on the claims of Swainโs (1985, 1995) output hypothesis, it aims to demonstrates how text-reconstruction activities in which learners collaboratively produce written output trigger noticing, hypothesis-testing and metalinguistic reflection on language use. It presents a psycholinguistic and sociolinguistic rationale for the use of such tasks, discusses the types of such activities, provides an overview of research projects investigating their application and, finally, offers a set of implications for classroom use as well as suggestions for further research in this area
- โฆ