168 research outputs found
CAPTλ₯Ό μν λ°μ λ³μ΄ λΆμ λ° CycleGAN κΈ°λ° νΌλλ°± μμ±
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :μΈλ¬Έλν νλκ³Όμ μΈμ§κ³Όνμ 곡,2020. 2. μ λ―Όν.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies.
This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system.
The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.μΈκ΅μ΄λ‘μμ νκ΅μ΄ κ΅μ‘μ λν κ΄μ¬μ΄ κ³ μ‘°λμ΄ νκ΅μ΄ νμ΅μμ μκ° ν¬κ² μ¦κ°νκ³ μμΌλ©°, μμ±μΈμ΄μ²λ¦¬ κΈ°μ μ μ μ©ν μ»΄ν¨ν° κΈ°λ° λ°μ κ΅μ‘(Computer-Assisted Pronunciation Training; CAPT) μ΄ν리μΌμ΄μ
μ λν μ°κ΅¬ λν μ κ·Ήμ μΌλ‘ μ΄λ£¨μ΄μ§κ³ μλ€. κ·ΈλΌμλ λΆκ΅¬νκ³ νμ‘΄νλ νκ΅μ΄ λ§νκΈ° κ΅μ‘ μμ€ν
μ μΈκ΅μΈμ νκ΅μ΄μ λν μΈμ΄νμ νΉμ§μ μΆ©λΆν νμ©νμ§ μκ³ μμΌλ©°, μ΅μ μΈμ΄μ²λ¦¬ κΈ°μ λν μ μ©λμ§ μκ³ μλ μ€μ μ΄λ€. κ°λ₯ν μμΈμΌλ‘μ¨λ μΈκ΅μΈ λ°ν νκ΅μ΄ νμμ λν λΆμμ΄ μΆ©λΆνκ² μ΄λ£¨μ΄μ§μ§ μμλ€λ μ , κ·Έλ¦¬κ³ κ΄λ ¨ μ°κ΅¬κ° μμ΄λ μ΄λ₯Ό μλνλ μμ€ν
μ λ°μνκΈ°μλ κ³ λνλ μ°κ΅¬κ° νμνλ€λ μ μ΄ μλ€. λΏλ§ μλλΌ CAPT κΈ°μ μ λ°μ μΌλ‘λ μ νΈμ²λ¦¬, μ΄μ¨ λΆμ, μμ°μ΄μ²λ¦¬ κΈ°λ²κ³Ό κ°μ νΉμ§ μΆμΆμ μμ‘΄νκ³ μμ΄μ μ ν©ν νΉμ§μ μ°Ύκ³ μ΄λ₯Ό μ ννκ² μΆμΆνλ λ°μ λ§μ μκ°κ³Ό λ
Έλ ₯μ΄ νμν μ€μ μ΄λ€. μ΄λ μ΅μ λ₯λ¬λ κΈ°λ° μΈμ΄μ²λ¦¬ κΈ°μ μ νμ©ν¨μΌλ‘μ¨ μ΄ κ³Όμ λν λ°μ μ μ¬μ§κ° λ§λ€λ λ°λ₯Ό μμ¬νλ€.
λ°λΌμ λ³Έ μ°κ΅¬λ λ¨Όμ CAPT μμ€ν
κ°λ°μ μμ΄ λ°μ λ³μ΄ μμκ³Ό μΈμ΄νμ μκ΄κ΄κ³λ₯Ό λΆμνμλ€. μΈκ΅μΈ νμλ€μ λλ
체 λ³μ΄ μμκ³Ό νκ΅μ΄ μμ΄λ―Ό νμλ€μ λλ
체 λ³μ΄ μμμ λμ‘°νκ³ μ£Όμν λ³μ΄λ₯Ό νμΈν ν, μκ΄κ΄κ³ λΆμμ ν΅νμ¬ μμ¬μν΅μ μν₯μ λ―ΈμΉλ μ€μλλ₯Ό νμ
νμλ€. κ·Έ κ²°κ³Ό, μ’
μ± μμ μ 3μ€ λ립μ νΌλ, μ΄λΆμ κ΄λ ¨ μ€λ₯κ° λ°μν κ²½μ° νΌλλ°± μμ±μ μ°μ μ μΌλ‘ λ°μνλ κ²μ΄ νμνλ€λ κ²μ΄ νμΈλμλ€.
κ΅μ λ νΌλλ°±μ μλμΌλ‘ μμ±νλ κ²μ CAPT μμ€ν
μ μ€μν κ³Όμ μ€ νλμ΄λ€. λ³Έ μ°κ΅¬λ μ΄ κ³Όμ κ° λ°νμ μ€νμΌ λ³νμ λ¬Έμ λ‘ ν΄μμ΄ κ°λ₯νλ€κ³ 보μμΌλ©°, μμ±μ μ λ μ κ²½λ§ (Cycle-consistent Generative Adversarial Network; CycleGAN) ꡬ쑰μμ λͺ¨λΈλ§νλ κ²μ μ μνμλ€. GAN λ€νΈμν¬μ μμ±λͺ¨λΈμ λΉμμ΄λ―Ό λ°νμ λΆν¬μ μμ΄λ―Ό λ°ν λΆν¬μ 맀νμ νμ΅νλ©°, Cycle consistency μμ€ν¨μλ₯Ό μ¬μ©ν¨μΌλ‘μ¨ λ°νκ° μ λ°μ μΈ κ΅¬μ‘°λ₯Ό μ μ§ν¨κ³Ό λμμ κ³Όλν κ΅μ μ λ°©μ§νμλ€. λ³λμ νΉμ§ μΆμΆ κ³Όμ μ΄ μμ΄ νμν νΉμ§λ€μ΄ CycleGAN νλ μμν¬μμ 무κ°λ
λ°©λ²μΌλ‘ μ€μ€λ‘ νμ΅λλ λ°©λ²μΌλ‘, μΈμ΄ νμ₯μ΄ μ©μ΄ν λ°©λ²μ΄λ€.
μΈμ΄νμ λΆμμμ λλ¬λ μ£Όμν λ³μ΄λ€ κ°μ μ°μ μμλ Auxiliary Classifier CycleGAN ꡬ쑰μμ λͺ¨λΈλ§νλ κ²μ μ μνμλ€. μ΄ λ°©λ²μ κΈ°μ‘΄μ CycleGANμ μ§μμ μ λͺ©μμΌ νΌλλ°± μμ±μ μμ±ν¨κ³Ό λμμ ν΄λΉ νΌλλ°±μ΄ μ΄λ€ μ νμ μ€λ₯μΈμ§ λΆλ₯νλ λ¬Έμ λ₯Ό μννλ€. μ΄λ λλ©μΈ μ§μμ΄ κ΅μ νΌλλ°± μμ± λ¨κ³κΉμ§ μ μ§λκ³ ν΅μ κ° κ°λ₯νλ€λ μ₯μ μ΄ μλ€λ λ°μ κ·Έ μμκ° μλ€.
λ³Έ μ°κ΅¬μμ μ μν λ°©λ²μ νκ°νκΈ° μν΄μ 27κ°μ λͺ¨κ΅μ΄λ₯Ό κ°λ 217λͺ
μ μ μλ―Έ μ΄ν λ°ν 65,100κ°λ‘ νΌλλ°± μλ μμ± λͺ¨λΈμ νλ ¨νκ³ , κ°μ μ¬λΆ λ° μ λμ λν μ§κ° νκ°λ₯Ό μννμλ€. μ μλ λ°©λ²μ μ¬μ©νμμ λ νμ΅μ λ³ΈμΈμ λͺ©μ리λ₯Ό μ μ§ν μ± κ΅μ λ λ°μμΌλ‘ λ³ννλ κ²μ΄ κ°λ₯νλ©°, μ ν΅μ μΈ λ°©λ²μΈ μλμ΄ λκΈ°μ μ€μ²©κ°μ° (Pitch-Synchronous Overlap-and-Add) μκ³ λ¦¬μ¦μ μ¬μ©νλ λ°©λ²μ λΉν΄ μλ κ°μ λ₯ 16.67%μ΄ νμΈλμλ€.Chapter 1. Introduction 1
1.1. Motivation 1
1.1.1. An Overview of CAPT Systems 3
1.1.2. Survey of existing Korean CAPT Systems 5
1.2. Problem Statement 7
1.3. Thesis Structure 7
Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9
2.1. Comparison between Korean and Chinese 11
2.1.1. Phonetic and Syllable Structure Comparisons 11
2.1.2. Phonological Comparisons 14
2.2. Related Works 16
2.3. Proposed Analysis Method 19
2.3.1. Corpus 19
2.3.2. Transcribers and Agreement Rates 22
2.4. Salient Pronunciation Variations 22
2.4.1. Segmental Variation Patterns 22
2.4.1.1. Discussions 25
2.4.2. Phonological Variation Patterns 26
2.4.1.2. Discussions 27
2.5. Summary 29
Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30
3.1. Related Works 31
3.1.1. Criteria used in L2 Speech 31
3.1.2. Criteria used in L2 Korean Speech 32
3.2. Proposed Human Evaluation Method 36
3.2.1. Reading Prompt Design 36
3.2.2. Evaluation Criteria Design 37
3.2.3. Raters and Agreement Rates 40
3.3. Linguistic Factors Affecting L2 Korean Accentedness 41
3.3.1. Pearsons Correlation Analysis 41
3.3.2. Discussions 42
3.3.3. Implications for Automatic Feedback Generation 44
3.4. Summary 45
Chapter 4. Corrective Feedback Generation for CAPT 46
4.1. Related Works 46
4.1.1. Prosody Transplantation 47
4.1.2. Recent Speech Conversion Methods 49
4.1.3. Evaluation of Corrective Feedback 50
4.2. Proposed Method: Corrective Feedback as a Style Transfer 51
4.2.1. Speech Analysis at Spectral Domain 53
4.2.2. Self-imitative Learning 55
4.2.3. An Analogy: CAPT System and GAN Architecture 57
4.3. Generative Adversarial Networks 59
4.3.1. Conditional GAN 61
4.3.2. CycleGAN 62
4.4. Experiment 63
4.4.1. Corpus 64
4.4.2. Baseline Implementation 65
4.4.3. Adversarial Training Implementation 65
4.4.4. Spectrogram-to-Spectrogram Training 66
4.5. Results and Evaluation 69
4.5.1. Spectrogram Generation Results 69
4.5.2. Perceptual Evaluation 70
4.5.3. Discussions 72
4.6. Summary 74
Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75
5.1. Linguistic Class Selection 75
5.2. Auxiliary Classifier CycleGAN Design 77
5.3. Experiment and Results 80
5.3.1. Corpus 80
5.3.2. Feature Annotations 81
5.3.3. Experiment Setup 81
5.3.4. Results 82
5.4. Summary 84
Chapter 6. Conclusion 86
6.1. Thesis Results 86
6.2. Thesis Contributions 88
6.3. Recommendations for Future Work 89
Bibliography 91
Appendix 107
Abstract in Korean 117
Acknowledgments 120Docto
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
How does dialect exposure affect learning to read and spell? An artificial orthography study
Correlational studies have demonstrated detrimental effects of exposure to a mismatch between a non-standard dialect at home and a mainstream variety at school on childrenβs literacy skills. However, dialect exposure often is confounded with reduced home literacy, negative teacher expectation and more limited educational opportunities. To provide proof of concept for a possible causal relationship between variety mismatch and literacy skills, we taught adult learners to read and spell an artificial language with or without dialect variants using an artificial orthography. In three experiments, we confirmed earlier findings that reading is more error-prone for contrastive words, i.e. words for which different variants exist in the input, especially when learners also acquire the joint meanings of these competing variants. Despite this contrastive deficit, no detriment from variety mismatch emerged for reading and spelling of untrained words, a task equivalent to non-word reading tests routinely administered to young school children. With longer training, we even found a benefit from variety mismatch on reading and spelling of untrained words. We suggest that such a dialect benefit in literacy learning can arise when competition between different variants leads learners to favour phonologically mediated decoding. Our findings should help to assuage educatorsβ concerns about detrimental effects of linguistic diversity
Phonological adaptation of English loanwords into Qassimi Arabic :an optimality- theoretic account
IPhD ThesisWithin the field of loanword phonology, this study enhances our understanding of the role played
by the contrastive features of the borrowing language in shaping the segmental adaptation patterns
of loanwords from the source language. This has been achieved by performing a theoretical
analysis of the segmental adaptation patterns of English loanwords into Qassimi Arabic, a dialect
spoken in the region of Qassim in central Saudi Arabia, using an Optimality-Theoretic framework.
The central argument of this study assumes that the inputs to QA are fully-specified English
outputs, which serve as inputs to QA. Then, the native grammar of QA allows only the
phonological features of inputs to surface that are contrastive in QA. Thus, redundant or noncontrastive phonological features in QA are eliminated from the outputs. The evidence behind the
argument that the contrastive features of QA segments play a main role in the adaptation process
emerges from adapting the English segments that are non-native in QA. For instance, English lax
vowels /Ιͺ/, /Κ/, /Γ¦/ are adapted as their tense counterparts in QA [i], [u] and [a]. I have argued that
the reason for this adaptation lies in the fact that the feature [ATR] is not a contrastive feature
within the QA vowel inventory. Therefore, dispensing with the value of the input feature [-ATR]
culminates in the tense vowels appearing at the surface level.
To identify the contrastive features of QA phonological inventory, I rely on the Contrastive
Hierarchy Theory proposed by Dresher (2009). This theory suggests that phonological features
should be ordered hierarchically to obtain only the contrastive features of any phonological
inventory. This is achieved by dividing any inventory into subsets of features until each segment
is distinguished contrastively from all others. Therefore, the features of QA segments are built
initially into a contrastive hierarchy model. Within this hierarchy, features are created and ordered
according to one or more of the following motivations: Activity, Minimality and Universality.
Finally, the contrastive hierarchy of QA segment inventory is converted into OT constraints. The
ranking of these constraints is sufficient to account for the evaluations of the segmental adaptation
patterns of loanwords from English into QA. For instance, based on the contrastive hierarchy of
QA, /b/ is contrastively specified as [-sonorant, +labial, -continuant]. In the adaptation of English
consonants, the English input segment /p/ is mapped consistently to [b] in the QA. In this case, the
contrastive hierarchy of QA consonant inventory contains the co-occurrence constraints *[Ξ±Voice,
+labial] and *[Ξ±Coronal, +labial], which filter the input features if the input is fully-specified
[-sonorant, +labial, -coronal, -continuant, -voiced, β¦], and permits only the contrastive features
[-sonorant, +labial, -continuant] to surface.Qassim University in Saudi Arabi
A corpus for large-scale phonetic typology
A major hurdle in data-driven research on typology is having sufficient data
in many languages to draw meaningful conclusions. We present VoxClamantis v1.0,
the first large-scale corpus for phonetic typology, with aligned segments and
estimated phoneme-level labels in 690 readings spanning 635 languages, along
with acoustic-phonetic measures of vowels and sibilants. Access to such data
can greatly facilitate investigation of phonetic typology at a large scale and
across many languages. However, it is non-trivial and computationally intensive
to obtain such alignments for hundreds of languages, many of which have few to
no resources presently available. We describe the methodology to create our
corpus, discuss caveats with current methods and their impact on the utility of
this data, and illustrate possible research directions through a series of case
studies on the 48 highest-quality readings. Our corpus and scripts are publicly
available for non-commercial use at https://voxclamantisproject.github.io.Comment: Accepted to ACL202
Experiences of Dyslexic Students Learning a Second Language: A Review of the Literature
A systematic review of the literature was conducted to explore the experiences that college students with dyslexia face learning a second language in the classroom setting while studying at a private institution in Central Virginia. This literature review offers an analysis of the scholarly research related to this topic. The processability theory is discussed in the first section, followed by a review of recent literature on how dyslexia affects the brainβs processing, specific experiences of students, and how to best support these students in second language acquisition (SLA). Lastly, the literature surrounds phonological processing, working memory, specific struggles in the classroom, and motivation. Finally, a gap in the literature is identified regarding the need for more research concerning the experiences of college students with dyslexia within a second language classroom
To be or not to be bilingual: cognitive processing skills and literacy development in monolingual English, emergent bilingual Zulu and English, as well as bilingual Afrikaans and English speaking children
A thesis submitted to the Faculty of Humanities,
Department of Psychology at the University of the Witwatersrand,
in fulfilment of the requirements for the degree of Doctor of Philosophy
October 2016.Literacy in multilingual contexts includes social and cognitive dimensions
(GoPaul-McNicol & Armour-Thomas, 1997). Becoming literate carries with it the ability to develop
and access higher-order thinking skills that are the building blocks for cognitive academic language
proficiency, as well as the means that define educational opportunities (Bialystok, 2007). South Africa
has 11 official languages and a multilingual education policy but South African schools are able to
determine their language of instruction policy of monolingualism or multilingualism (Heugh, 2010).
This raises the question of whether monolingualism or bilingualism influences childrenβs successful
acquisition of reading. It is important to investigate the effect this has on reading processes and skills
of monolingual and bilingual children because this issue has received limited research attention while
it contributes to our greater understanding of how childrenβs cognitive capacities for literacy
attainment are either constrained or promoted through broader social factors operating in a childβs
literacy-learning environment (Bialystok, 2007; Vygotsky, 1978). Cognitive processing and reading
skills were assessed in monolingual and bilingual children at a public school in an urban area of
Johannesburg. An English-speaking monolingual group with English as the language of instruction (N
= 100) was compared with a Zulu-English bilingual group with Zulu as first language (L1) speaking
proficiency and English as second language (L2) literacy experience (N = 100) on measures of
reading, phonological awareness, vocabulary skills, and working memory. Performance in cognitive
processing and reading skills of these two groups was compared to an Afrikaans-English bilingual
group (N = 100) with dual medium instruction. Tests of language proficiency confirmed that the
Afrikaans-English bilinguals were balanced bilinguals and that the Zulu-English bilinguals were
partial bilinguals.
Aim and method: The purpose of this study was to expand knowledge in the field of second
language reading acquisition and language of instruction by examining the impact of language related
factors on the cognitive development and literacy competence of monolingual and bilingual children
in the South African context. The central tenet of the bio-ecological approach to language, cognitive
and reading assessment is that language acquisition is inseparable from the context in which it is
learned (Armour-Thomas & Go-Paul-McNicol, 1997). Drawing from this approach, the present
research project investigated the effects of the level of orthographic transparency on reading
development in the transparent L1 and opaque L2 of biliterate Afrikaans-English bilinguals learning
to read in a dual medium school setting. The effects of oral vs. written language proficiency in the L1
on the acquisition of L2 English reading was also investigated by examining whether reading
processes and skills transferred from one language to another and the direction or nature of this
transfer in partial and balanced bilinguals. Finally, whether a balanced bilingualism and biliteracy
Cognitive processing skills and literacy development in monolingual and bilingual children in South Africa
vi
experience had beneficial effects on cognitive tasks demanding high levels of working memory
capacity, was investigated.
Results: Reading in Afrikaans β the more transparent orthography β reached a higher
competency level than reading in the less transparent English. Dual medium learners and L1 English
monolingual learners acquired reading skills in their home language(s) at a higher level than L2
English with L1 Zulu speaking proficiency learners did. Dual medium learners outperformed both
monolingual learners and L2 English with L1 Zulu speaking proficiency learners on tests of
phonological awareness, working memory, and reading comprehension. They also reached similar
competency levels in tests of vocabulary knowledge than monolingual English (L1) learners. These
differences translated into different relationships and strengths for reading attainment in monolingual
and bilingual children. These findings provide support for a language-based and context-dependent
bio-ecological model of reading attainment for South African children.
Conclusions: Bilingual children who are exposed to dual medium reading instruction
programmes that value bilingualism philosophically and support it pedagogically create optimal
conditions for high levels of cognitive development and academic achievement, both in the first and
in the L2. Absence of mother tongue instruction and English-only instruction result in a reading
achievement gap between emergent Zulu-English bilinguals and English monolinguals. This effect is
not observed in the biliterate Afrikaans-English bilinguals; instead, these children performed better
than the English monolinguals on many English tasks and working tasks requiring high levels of
executive control and analysis of linguistic knowledge, despite English being their L2 while learning
to concurrently read in Afrikaans and English. Arguments for and (misguided) arguments against dual
medium education are examined to identify the consequences of translating this model of education
into effective schooling practices, given the socio-political contexts in which educational reforms take
place at local schools and in communities (Heugh, 2002). More broadly, good early childhood
education includes a rich language learning environment with skilled, responsive teachers who
facilitate childrenβs literacy learning by providing intentional exposure to and support for vocabulary
and concept development. Classroom settings that provide extensive opportunities to build childrenβs
reading competences are beneficial for young dual language learners no less than for children
acquiring literacy skills in a one-language environment (Cummins, 2000; Heugh, 2002).GR201
- β¦