Search CORE

168 research outputs found

CAPT를 위한 발음 변이 분석 및 CycleGAN 기반 피드백 생성

Author: 양승희
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :인문대학 협동과정 인지과학전공,2020. 2. 정민화.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies. This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system. The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.외국어로서의 한국어 교육에 대한 관심이 고조되어 한국어 학습자의 수가 크게 증가하고 있으며, 음성언어처리 기술을 적용한 컴퓨터 기반 발음 교육(Computer-Assisted Pronunciation Training; CAPT) 어플리케이션에 대한 연구 또한 적극적으로 이루어지고 있다. 그럼에도 불구하고 현존하는 한국어 말하기 교육 시스템은 외국인의 한국어에 대한 언어학적 특징을 충분히 활용하지 않고 있으며, 최신 언어처리 기술 또한 적용되지 않고 있는 실정이다. 가능한 원인으로써는 외국인 발화 한국어 현상에 대한 분석이 충분하게 이루어지지 않았다는 점, 그리고 관련 연구가 있어도 이를 자동화된 시스템에 반영하기에는 고도화된 연구가 필요하다는 점이 있다. 뿐만 아니라 CAPT 기술 전반적으로는 신호처리, 운율 분석, 자연어처리 기법과 같은 특징 추출에 의존하고 있어서 적합한 특징을 찾고 이를 정확하게 추출하는 데에 많은 시간과 노력이 필요한 실정이다. 이는 최신 딥러닝 기반 언어처리 기술을 활용함으로써 이 과정 또한 발전의 여지가 많다는 바를 시사한다. 따라서 본 연구는 먼저 CAPT 시스템 개발에 있어 발음 변이 양상과 언어학적 상관관계를 분석하였다. 외국인 화자들의 낭독체 변이 양상과 한국어 원어민 화자들의 낭독체 변이 양상을 대조하고 주요한 변이를 확인한 후, 상관관계 분석을 통하여 의사소통에 영향을 미치는 중요도를 파악하였다. 그 결과, 종성 삭제와 3중 대립의 혼동, 초분절 관련 오류가 발생할 경우 피드백 생성에 우선적으로 반영하는 것이 필요하다는 것이 확인되었다. 교정된 피드백을 자동으로 생성하는 것은 CAPT 시스템의 중요한 과제 중 하나이다. 본 연구는 이 과제가 발화의 스타일 변화의 문제로 해석이 가능하다고 보았으며, 생성적 적대 신경망 (Cycle-consistent Generative Adversarial Network; CycleGAN) 구조에서 모델링하는 것을 제안하였다. GAN 네트워크의 생성모델은 비원어민 발화의 분포와 원어민 발화 분포의 매핑을 학습하며, Cycle consistency 손실함수를 사용함으로써 발화간 전반적인 구조를 유지함과 동시에 과도한 교정을 방지하였다. 별도의 특징 추출 과정이 없이 필요한 특징들이 CycleGAN 프레임워크에서 무감독 방법으로 스스로 학습되는 방법으로, 언어 확장이 용이한 방법이다. 언어학적 분석에서 드러난 주요한 변이들 간의 우선순위는 Auxiliary Classifier CycleGAN 구조에서 모델링하는 것을 제안하였다. 이 방법은 기존의 CycleGAN에 지식을 접목시켜 피드백 음성을 생성함과 동시에 해당 피드백이 어떤 유형의 오류인지 분류하는 문제를 수행한다. 이는 도메인 지식이 교정 피드백 생성 단계까지 유지되고 통제가 가능하다는 장점이 있다는 데에 그 의의가 있다. 본 연구에서 제안한 방법을 평가하기 위해서 27개의 모국어를 갖는 217명의 유의미 어휘 발화 65,100개로 피드백 자동 생성 모델을 훈련하고, 개선 여부 및 정도에 대한 지각 평가를 수행하였다. 제안된 방법을 사용하였을 때 학습자 본인의 목소리를 유지한 채 교정된 발음으로 변환하는 것이 가능하며, 전통적인 방법인 음높이 동기식 중첩가산 (Pitch-Synchronous Overlap-and-Add) 알고리즘을 사용하는 방법에 비해 상대 개선률 16.67%이 확인되었다.Chapter 1. Introduction 1 1.1. Motivation 1 1.1.1. An Overview of CAPT Systems 3 1.1.2. Survey of existing Korean CAPT Systems 5 1.2. Problem Statement 7 1.3. Thesis Structure 7 Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9 2.1. Comparison between Korean and Chinese 11 2.1.1. Phonetic and Syllable Structure Comparisons 11 2.1.2. Phonological Comparisons 14 2.2. Related Works 16 2.3. Proposed Analysis Method 19 2.3.1. Corpus 19 2.3.2. Transcribers and Agreement Rates 22 2.4. Salient Pronunciation Variations 22 2.4.1. Segmental Variation Patterns 22 2.4.1.1. Discussions 25 2.4.2. Phonological Variation Patterns 26 2.4.1.2. Discussions 27 2.5. Summary 29 Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30 3.1. Related Works 31 3.1.1. Criteria used in L2 Speech 31 3.1.2. Criteria used in L2 Korean Speech 32 3.2. Proposed Human Evaluation Method 36 3.2.1. Reading Prompt Design 36 3.2.2. Evaluation Criteria Design 37 3.2.3. Raters and Agreement Rates 40 3.3. Linguistic Factors Affecting L2 Korean Accentedness 41 3.3.1. Pearsons Correlation Analysis 41 3.3.2. Discussions 42 3.3.3. Implications for Automatic Feedback Generation 44 3.4. Summary 45 Chapter 4. Corrective Feedback Generation for CAPT 46 4.1. Related Works 46 4.1.1. Prosody Transplantation 47 4.1.2. Recent Speech Conversion Methods 49 4.1.3. Evaluation of Corrective Feedback 50 4.2. Proposed Method: Corrective Feedback as a Style Transfer 51 4.2.1. Speech Analysis at Spectral Domain 53 4.2.2. Self-imitative Learning 55 4.2.3. An Analogy: CAPT System and GAN Architecture 57 4.3. Generative Adversarial Networks 59 4.3.1. Conditional GAN 61 4.3.2. CycleGAN 62 4.4. Experiment 63 4.4.1. Corpus 64 4.4.2. Baseline Implementation 65 4.4.3. Adversarial Training Implementation 65 4.4.4. Spectrogram-to-Spectrogram Training 66 4.5. Results and Evaluation 69 4.5.1. Spectrogram Generation Results 69 4.5.2. Perceptual Evaluation 70 4.5.3. Discussions 72 4.6. Summary 74 Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75 5.1. Linguistic Class Selection 75 5.2. Auxiliary Classifier CycleGAN Design 77 5.3. Experiment and Results 80 5.3.1. Corpus 80 5.3.2. Feature Annotations 81 5.3.3. Experiment Setup 81 5.3.4. Results 82 5.4. Summary 84 Chapter 6. Conclusion 86 6.1. Thesis Results 86 6.2. Thesis Contributions 88 6.3. Recommendations for Future Work 89 Bibliography 91 Appendix 107 Abstract in Korean 117 Acknowledgments 120Docto

SNU Open Repository and Archive

Acoustic Modelling for Under-Resourced Languages

Author: Stüker Sebastian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

KITopen

How does dialect exposure affect learning to read and spell? An artificial orthography study

Author: Kempe Vera
Panayotov Nikolay
Williams Glenn P.
Publication venue
Publication date: 30/04/2020
Field of study

Correlational studies have demonstrated detrimental effects of exposure to a mismatch between a non-standard dialect at home and a mainstream variety at school on children’s literacy skills. However, dialect exposure often is confounded with reduced home literacy, negative teacher expectation and more limited educational opportunities. To provide proof of concept for a possible causal relationship between variety mismatch and literacy skills, we taught adult learners to read and spell an artificial language with or without dialect variants using an artificial orthography. In three experiments, we confirmed earlier findings that reading is more error-prone for contrastive words, i.e. words for which different variants exist in the input, especially when learners also acquire the joint meanings of these competing variants. Despite this contrastive deficit, no detriment from variety mismatch emerged for reading and spelling of untrained words, a task equivalent to non-word reading tests routinely administered to young school children. With longer training, we even found a benefit from variety mismatch on reading and spelling of untrained words. We suggest that such a dialect benefit in literacy learning can arise when competition between different variants leads learners to favour phonologically mediated decoding. Our findings should help to assuage educators’ concerns about detrimental effects of linguistic diversity

Abertay Research Portal

Sunderland University Institutional Repository

Phonological adaptation of English loanwords into Qassimi Arabic :an optimality- theoretic account

Author: Alhoody Metab Mohammad A
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

IPhD ThesisWithin the field of loanword phonology, this study enhances our understanding of the role played by the contrastive features of the borrowing language in shaping the segmental adaptation patterns of loanwords from the source language. This has been achieved by performing a theoretical analysis of the segmental adaptation patterns of English loanwords into Qassimi Arabic, a dialect spoken in the region of Qassim in central Saudi Arabia, using an Optimality-Theoretic framework. The central argument of this study assumes that the inputs to QA are fully-specified English outputs, which serve as inputs to QA. Then, the native grammar of QA allows only the phonological features of inputs to surface that are contrastive in QA. Thus, redundant or noncontrastive phonological features in QA are eliminated from the outputs. The evidence behind the argument that the contrastive features of QA segments play a main role in the adaptation process emerges from adapting the English segments that are non-native in QA. For instance, English lax vowels /ɪ/, /ʊ/, /æ/ are adapted as their tense counterparts in QA [i], [u] and [a]. I have argued that the reason for this adaptation lies in the fact that the feature [ATR] is not a contrastive feature within the QA vowel inventory. Therefore, dispensing with the value of the input feature [-ATR] culminates in the tense vowels appearing at the surface level. To identify the contrastive features of QA phonological inventory, I rely on the Contrastive Hierarchy Theory proposed by Dresher (2009). This theory suggests that phonological features should be ordered hierarchically to obtain only the contrastive features of any phonological inventory. This is achieved by dividing any inventory into subsets of features until each segment is distinguished contrastively from all others. Therefore, the features of QA segments are built initially into a contrastive hierarchy model. Within this hierarchy, features are created and ordered according to one or more of the following motivations: Activity, Minimality and Universality. Finally, the contrastive hierarchy of QA segment inventory is converted into OT constraints. The ranking of these constraints is sufficient to account for the evaluations of the segmental adaptation patterns of loanwords from English into QA. For instance, based on the contrastive hierarchy of QA, /b/ is contrastively specified as [-sonorant, +labial, -continuant]. In the adaptation of English consonants, the English input segment /p/ is mapped consistently to [b] in the QA. In this case, the contrastive hierarchy of QA consonant inventory contains the co-occurrence constraints *[αVoice, +labial] and *[αCoronal, +labial], which filter the input features if the input is fully-specified [-sonorant, +labial, -coronal, -continuant, -voiced, …], and permits only the contrastive features [-sonorant, +labial, -continuant] to surface.Qassim University in Saudi Arabi

Newcastle University eTheses

A corpus for large-scale phonetic typology

Author: Black Alan W
Chodroff Eleanor
Cotterell Ryan
Eisner Jason
Pimentel Tiago
Salesky Elizabeth
Wiesner Matthew
Publication venue
Publication date: 01/01/2020
Field of study

A major hurdle in data-driven research on typology is having sufficient data in many languages to draw meaningful conclusions. We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology, with aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants. Access to such data can greatly facilitate investigation of phonetic typology at a large scale and across many languages. However, it is non-trivial and computationally intensive to obtain such alignments for hundreds of languages, many of which have few to no resources presently available. We describe the methodology to create our corpus, discuss caveats with current methods and their impact on the utility of this data, and illustrate possible research directions through a series of case studies on the 48 highest-quality readings. Our corpus and scripts are publicly available for non-commercial use at https://voxclamantisproject.github.io.Comment: Accepted to ACL202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

White Rose Research Online

Experiences of Dyslexic Students Learning a Second Language: A Review of the Literature

Author: Ricci Lauren
Publication venue: Scholars Crossing
Publication date: 05/05/2024
Field of study

A systematic review of the literature was conducted to explore the experiences that college students with dyslexia face learning a second language in the classroom setting while studying at a private institution in Central Virginia. This literature review offers an analysis of the scholarly research related to this topic. The processability theory is discussed in the first section, followed by a review of recent literature on how dyslexia affects the brain’s processing, specific experiences of students, and how to best support these students in second language acquisition (SLA). Lastly, the literature surrounds phonological processing, working memory, specific struggles in the classroom, and motivation. Finally, a gap in the literature is identified regarding the need for more research concerning the experiences of college students with dyslexia within a second language classroom

Liberty University Digital Commons

To be or not to be bilingual: cognitive processing skills and literacy development in monolingual English, emergent bilingual Zulu and English, as well as bilingual Afrikaans and English speaking children

Author: De Sousa Diana Soares
Publication venue
Publication date: 01/01/2016
Field of study

A thesis submitted to the Faculty of Humanities, Department of Psychology at the University of the Witwatersrand, in fulfilment of the requirements for the degree of Doctor of Philosophy October 2016.Literacy in multilingual contexts includes social and cognitive dimensions (GoPaul-McNicol & Armour-Thomas, 1997). Becoming literate carries with it the ability to develop and access higher-order thinking skills that are the building blocks for cognitive academic language proficiency, as well as the means that define educational opportunities (Bialystok, 2007). South Africa has 11 official languages and a multilingual education policy but South African schools are able to determine their language of instruction policy of monolingualism or multilingualism (Heugh, 2010). This raises the question of whether monolingualism or bilingualism influences children’s successful acquisition of reading. It is important to investigate the effect this has on reading processes and skills of monolingual and bilingual children because this issue has received limited research attention while it contributes to our greater understanding of how children’s cognitive capacities for literacy attainment are either constrained or promoted through broader social factors operating in a child’s literacy-learning environment (Bialystok, 2007; Vygotsky, 1978). Cognitive processing and reading skills were assessed in monolingual and bilingual children at a public school in an urban area of Johannesburg. An English-speaking monolingual group with English as the language of instruction (N = 100) was compared with a Zulu-English bilingual group with Zulu as first language (L1) speaking proficiency and English as second language (L2) literacy experience (N = 100) on measures of reading, phonological awareness, vocabulary skills, and working memory. Performance in cognitive processing and reading skills of these two groups was compared to an Afrikaans-English bilingual group (N = 100) with dual medium instruction. Tests of language proficiency confirmed that the Afrikaans-English bilinguals were balanced bilinguals and that the Zulu-English bilinguals were partial bilinguals. Aim and method: The purpose of this study was to expand knowledge in the field of second language reading acquisition and language of instruction by examining the impact of language related factors on the cognitive development and literacy competence of monolingual and bilingual children in the South African context. The central tenet of the bio-ecological approach to language, cognitive and reading assessment is that language acquisition is inseparable from the context in which it is learned (Armour-Thomas & Go-Paul-McNicol, 1997). Drawing from this approach, the present research project investigated the effects of the level of orthographic transparency on reading development in the transparent L1 and opaque L2 of biliterate Afrikaans-English bilinguals learning to read in a dual medium school setting. The effects of oral vs. written language proficiency in the L1 on the acquisition of L2 English reading was also investigated by examining whether reading processes and skills transferred from one language to another and the direction or nature of this transfer in partial and balanced bilinguals. Finally, whether a balanced bilingualism and biliteracy Cognitive processing skills and literacy development in monolingual and bilingual children in South Africa vi experience had beneficial effects on cognitive tasks demanding high levels of working memory capacity, was investigated. Results: Reading in Afrikaans – the more transparent orthography – reached a higher competency level than reading in the less transparent English. Dual medium learners and L1 English monolingual learners acquired reading skills in their home language(s) at a higher level than L2 English with L1 Zulu speaking proficiency learners did. Dual medium learners outperformed both monolingual learners and L2 English with L1 Zulu speaking proficiency learners on tests of phonological awareness, working memory, and reading comprehension. They also reached similar competency levels in tests of vocabulary knowledge than monolingual English (L1) learners. These differences translated into different relationships and strengths for reading attainment in monolingual and bilingual children. These findings provide support for a language-based and context-dependent bio-ecological model of reading attainment for South African children. Conclusions: Bilingual children who are exposed to dual medium reading instruction programmes that value bilingualism philosophically and support it pedagogically create optimal conditions for high levels of cognitive development and academic achievement, both in the first and in the L2. Absence of mother tongue instruction and English-only instruction result in a reading achievement gap between emergent Zulu-English bilinguals and English monolinguals. This effect is not observed in the biliterate Afrikaans-English bilinguals; instead, these children performed better than the English monolinguals on many English tasks and working tasks requiring high levels of executive control and analysis of linguistic knowledge, despite English being their L2 while learning to concurrently read in Afrikaans and English. Arguments for and (misguided) arguments against dual medium education are examined to identify the consequences of translating this model of education into effective schooling practices, given the socio-political contexts in which educational reforms take place at local schools and in communities (Heugh, 2002). More broadly, good early childhood education includes a rich language learning environment with skilled, responsive teachers who facilitate children’s literacy learning by providing intentional exposure to and support for vocabulary and concept development. Classroom settings that provide extensive opportunities to build children’s reading competences are beneficial for young dual language learners no less than for children acquiring literacy skills in a one-language environment (Cummins, 2000; Heugh, 2002).GR201

Wits Institutional Repository on DSPACE