Search CORE

1,659 research outputs found

Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

Author: Chung Minhwa
Yang Seung Hee
Publication venue
Publication date: 20/04/2019
Field of study

Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising

arXiv.org e-Print Archive

Crossref

SNU Open Repository and Archive

Directions for the future of technology in pronunciation research and teaching

Author: Cucchiarini Catia
Derwing Tracey M.
Foote Jennifer A.
Hardison Debra M.
Levis Greta M.
Levis John M.
Mixdorff Hansjorg
Munro Murray J.
O\u27Brien Mary G.
Strik Helmer
Thomson Ron I.
Publication venue: Iowa State University Digital Repository
Publication date: 01/02/2019
Field of study

This paper reports on the role of technology in state-of-the-art pronunciation research and instruction, and makes concrete suggestions for future developments. The point of departure for this contribution is that the goal of second language (L2) pronunciation research and teaching should be enhanced comprehensibility and intelligibility as opposed to native-likeness. Three main areas are covered here. We begin with a presentation of advanced uses of pronunciation technology in research with a special focus on the expertise required to carry out even small-scale investigations. Next, we discuss the nature of data in pronunciation research, pointing to ways in which future work can build on advances in corpus research and crowdsourcing. Finally, we consider how these insights pave the way for researchers and developers working to create research-informed, computer-assisted pronunciation teaching resources. We conclude with predictions for future developments

Digital Repository @ Iowa State University (ISU)

CAPT를 위한 발음 변이 분석 및 CycleGAN 기반 피드백 생성

Author: 양승희
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :인문대학 협동과정 인지과학전공,2020. 2. 정민화.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies. This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system. The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.외국어로서의 한국어 교육에 대한 관심이 고조되어 한국어 학습자의 수가 크게 증가하고 있으며, 음성언어처리 기술을 적용한 컴퓨터 기반 발음 교육(Computer-Assisted Pronunciation Training; CAPT) 어플리케이션에 대한 연구 또한 적극적으로 이루어지고 있다. 그럼에도 불구하고 현존하는 한국어 말하기 교육 시스템은 외국인의 한국어에 대한 언어학적 특징을 충분히 활용하지 않고 있으며, 최신 언어처리 기술 또한 적용되지 않고 있는 실정이다. 가능한 원인으로써는 외국인 발화 한국어 현상에 대한 분석이 충분하게 이루어지지 않았다는 점, 그리고 관련 연구가 있어도 이를 자동화된 시스템에 반영하기에는 고도화된 연구가 필요하다는 점이 있다. 뿐만 아니라 CAPT 기술 전반적으로는 신호처리, 운율 분석, 자연어처리 기법과 같은 특징 추출에 의존하고 있어서 적합한 특징을 찾고 이를 정확하게 추출하는 데에 많은 시간과 노력이 필요한 실정이다. 이는 최신 딥러닝 기반 언어처리 기술을 활용함으로써 이 과정 또한 발전의 여지가 많다는 바를 시사한다. 따라서 본 연구는 먼저 CAPT 시스템 개발에 있어 발음 변이 양상과 언어학적 상관관계를 분석하였다. 외국인 화자들의 낭독체 변이 양상과 한국어 원어민 화자들의 낭독체 변이 양상을 대조하고 주요한 변이를 확인한 후, 상관관계 분석을 통하여 의사소통에 영향을 미치는 중요도를 파악하였다. 그 결과, 종성 삭제와 3중 대립의 혼동, 초분절 관련 오류가 발생할 경우 피드백 생성에 우선적으로 반영하는 것이 필요하다는 것이 확인되었다. 교정된 피드백을 자동으로 생성하는 것은 CAPT 시스템의 중요한 과제 중 하나이다. 본 연구는 이 과제가 발화의 스타일 변화의 문제로 해석이 가능하다고 보았으며, 생성적 적대 신경망 (Cycle-consistent Generative Adversarial Network; CycleGAN) 구조에서 모델링하는 것을 제안하였다. GAN 네트워크의 생성모델은 비원어민 발화의 분포와 원어민 발화 분포의 매핑을 학습하며, Cycle consistency 손실함수를 사용함으로써 발화간 전반적인 구조를 유지함과 동시에 과도한 교정을 방지하였다. 별도의 특징 추출 과정이 없이 필요한 특징들이 CycleGAN 프레임워크에서 무감독 방법으로 스스로 학습되는 방법으로, 언어 확장이 용이한 방법이다. 언어학적 분석에서 드러난 주요한 변이들 간의 우선순위는 Auxiliary Classifier CycleGAN 구조에서 모델링하는 것을 제안하였다. 이 방법은 기존의 CycleGAN에 지식을 접목시켜 피드백 음성을 생성함과 동시에 해당 피드백이 어떤 유형의 오류인지 분류하는 문제를 수행한다. 이는 도메인 지식이 교정 피드백 생성 단계까지 유지되고 통제가 가능하다는 장점이 있다는 데에 그 의의가 있다. 본 연구에서 제안한 방법을 평가하기 위해서 27개의 모국어를 갖는 217명의 유의미 어휘 발화 65,100개로 피드백 자동 생성 모델을 훈련하고, 개선 여부 및 정도에 대한 지각 평가를 수행하였다. 제안된 방법을 사용하였을 때 학습자 본인의 목소리를 유지한 채 교정된 발음으로 변환하는 것이 가능하며, 전통적인 방법인 음높이 동기식 중첩가산 (Pitch-Synchronous Overlap-and-Add) 알고리즘을 사용하는 방법에 비해 상대 개선률 16.67%이 확인되었다.Chapter 1. Introduction 1 1.1. Motivation 1 1.1.1. An Overview of CAPT Systems 3 1.1.2. Survey of existing Korean CAPT Systems 5 1.2. Problem Statement 7 1.3. Thesis Structure 7 Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9 2.1. Comparison between Korean and Chinese 11 2.1.1. Phonetic and Syllable Structure Comparisons 11 2.1.2. Phonological Comparisons 14 2.2. Related Works 16 2.3. Proposed Analysis Method 19 2.3.1. Corpus 19 2.3.2. Transcribers and Agreement Rates 22 2.4. Salient Pronunciation Variations 22 2.4.1. Segmental Variation Patterns 22 2.4.1.1. Discussions 25 2.4.2. Phonological Variation Patterns 26 2.4.1.2. Discussions 27 2.5. Summary 29 Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30 3.1. Related Works 31 3.1.1. Criteria used in L2 Speech 31 3.1.2. Criteria used in L2 Korean Speech 32 3.2. Proposed Human Evaluation Method 36 3.2.1. Reading Prompt Design 36 3.2.2. Evaluation Criteria Design 37 3.2.3. Raters and Agreement Rates 40 3.3. Linguistic Factors Affecting L2 Korean Accentedness 41 3.3.1. Pearsons Correlation Analysis 41 3.3.2. Discussions 42 3.3.3. Implications for Automatic Feedback Generation 44 3.4. Summary 45 Chapter 4. Corrective Feedback Generation for CAPT 46 4.1. Related Works 46 4.1.1. Prosody Transplantation 47 4.1.2. Recent Speech Conversion Methods 49 4.1.3. Evaluation of Corrective Feedback 50 4.2. Proposed Method: Corrective Feedback as a Style Transfer 51 4.2.1. Speech Analysis at Spectral Domain 53 4.2.2. Self-imitative Learning 55 4.2.3. An Analogy: CAPT System and GAN Architecture 57 4.3. Generative Adversarial Networks 59 4.3.1. Conditional GAN 61 4.3.2. CycleGAN 62 4.4. Experiment 63 4.4.1. Corpus 64 4.4.2. Baseline Implementation 65 4.4.3. Adversarial Training Implementation 65 4.4.4. Spectrogram-to-Spectrogram Training 66 4.5. Results and Evaluation 69 4.5.1. Spectrogram Generation Results 69 4.5.2. Perceptual Evaluation 70 4.5.3. Discussions 72 4.6. Summary 74 Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75 5.1. Linguistic Class Selection 75 5.2. Auxiliary Classifier CycleGAN Design 77 5.3. Experiment and Results 80 5.3.1. Corpus 80 5.3.2. Feature Annotations 81 5.3.3. Experiment Setup 81 5.3.4. Results 82 5.4. Summary 84 Chapter 6. Conclusion 86 6.1. Thesis Results 86 6.2. Thesis Contributions 88 6.3. Recommendations for Future Work 89 Bibliography 91 Appendix 107 Abstract in Korean 117 Acknowledgments 120Docto

SNU Open Repository and Archive

On the (un)conditionality of automatic attitude activation: the valence proportion effect

Author: De Houwer Jan
Everaert Tom
Spruyt Adriaan
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2011
Field of study

Affective priming studies have shown that participants are faster to pronounce affectively polarized target words that are preceded by affectively congruent prime words than affectively polarized target words that are preceded by affectively incongruent prime words. We examined whether affective priming of naming responses depends on the valence proportion (i.e., the proportion of stimuli that are affectively polarized). In one group of participants, experimental trials were embedded in a context of filler trials that consisted of affectively polarized stimulus materials (i.e., high valence proportion condition). In a second group, the same set of experimental trials was embedded in a context of filler trials consisting of neutral stimuli (i.e., low valence proportion condition). Results showed that affective priming of naming responses was significantly stronger in the high valence proportion condition than in the low valence proportion condition. We conclude that (a) subtle aspects of the procedure can influence affective priming of naming responses, (b) finding affective priming of naming responses does not allow for the conclusion that affective stimulus processing is unconditional, and (c) affective stimulus processing depends on selective attention for affective stimulus information

Crossref

Ghent University Academic Bibliography

The locus of post-lexical semantic matching effects on semantic priming: biasing a binary response or a binary decision?

Author: VanVoorhis Bart Aaron
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1995
Field of study

Two hundred introductory level students at the University of Wisconsin-LaCrosse and at Iowa State University served as participants. Forty participants each were assigned to each level of the between-subjects manipulation of type of task (standard lexical decision, standard pronunciation, single-response lexical decision, keypress go/no go and pronunciation go/no go). Type of priming (forward, mediated, and backward) served as a within-subjects manipulation;The pattern of priming across tasks failed to support predictions derived from Neely and Keefe\u27s three-process theory (1989), which predicts mediated priming only for pronunciation and backward priming only for lexical decision. The data showed a reliable mediated priming effect for all tasks except pronunciation. Pronunciation did not show a reliable backward priming effect. Reliable forward priming was found for all tasks. The data were inconclusive regarding the locus of the effects of the three processes outlined by Neely and Keefe. Theories of semantic priming need to be reexamined to incorporate these unexpected patterns

Digital Repository @ Iowa State University (ISU)

Experience with foreign accent influences non-native (L2) word recognition: The case of th-substitutions [Abstract]

Author: Hanulikova A.
Weber A.
Publication venue
Publication date: 01/04/2009
Field of study

MPG.PuRe

The Effect of Bilingual Proficiency in Indian English on Bilabial Plosive

Author: Chawla Taniya
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2021
Field of study

Background: Bilingual speech production studies have highlighted that level of proficiency influences the acoustic-phonetic representation of phonemes in both languages (MacKay, Flege, Piske, & Schirru 2001; Zárate-Sández, 2015). The results for bilingual speech production reveal that proficient/early bilinguals produce distinct acoustic properties for the same phoneme in each language, whereas less proficient/late bilinguals produce acoustic properties for a phoneme that is closer to the native language (Flege et al., 2003; Fowler et al., 2008). Acoustic-phonetic studies for Hindi (L1) and Indian English (L2) for bilingual speakers have been understudied, and the level of proficiency has not been considered in Hindi and Indian English bilingual speakers. The present study aimed to measure the acoustic differences produced by bilingual speakers of varying proficiencies for Indian English on bilabial plosive and determine how the bilabial plosives are different from American English bilabial plosives. Methods: The sample size for this study was twenty-four. However, only twenty participants (eleven females) between the ages of eighteen and fifty, with normal speech and hearing, were recruited. The lack of recruitment of four more participants was due to the inability to find bilingual speakers who spoke Hindi as their first language and Indian English as their second language and COVID-19 restrictions imposed on recruitment (n=4). The participants were divided into three groups based on language and proficiency: a monolingual American English group, a proficient bilingual Hindi-Indian English group, and a less-proficient bilingual Hindi-Indian English group. The bilinguals were divided into a proficient and less proficient group based on the Language Experience and Proficiency Questionnaire (Marian, Blumenfeld, & Kaushanskaya, 2007). Following the screening, participants took part in a Nonword Repetition Task. Data were analyzed using Praat and Voice Sauce software. A linear mixed-effects model using R statistics was used for the statistical analysis. Results: Data from 20 participants (seven proficient bilingual speakers, five less-proficient bilingual speakers, and eight monolingual speakers) were included in the data analysis. Approximately four thousand repetitions were evaluated across the remaining participants. There were no significant main effects across the four dependent variables, but there was an interaction effect between group and phoneme on two dependent variables. The closure duration for proficient bilingual speakers compared to less-proficient bilingual speakers were significantly different between the voiceless unaspirated bilabial plosive (VLE) and voiceless aspirated bilabial plosive (VLH), as well as voiced unaspirated bilabial plosive (VE) and voiced aspirated bilabial plosive (VH). For spectral tilt, there was a significant difference between the VLE and VLH for proficient bilingual speakers compared to less proficient bilingual speakers. Discussion: The results of this study suggest that proficient bilingual speakers have a faster rate of speech in both their first language and second language. Therefore, it is difficult to provide information on whether this group has separate acoustic-phonetic characteristics for each phoneme for each language. In contrast, the less-proficient bilingual speakers seem to have a unidirectional relationship (i.e., first language influences the second language). Furthermore, the results of the acoustic characteristics for the control group i.e., monolingual American English speakers suggest that they may have acoustic-phonetic characteristics that represent a single acoustic-phonetic representation of bilabial plosive with their voicing contrast

The Research Repository @ WVU (West Virginia University)

A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn

Author: Avesani Cinzia
Baker Brett Joseph
Balling Laura Winther
Behne Dawn M.
Best Catherine
Bundgaard-Nielsen Rikke
Carlet Angélica
Cebrian Juli
Christensen Ken Ramshøj
Cooper Angela
Flege James Emil
Hejná Michaela
Hejná Mísa
Horslund Camilla Søballe
Hua Congehao
Højen Anders
Højen Anders
Jespersen Anna
Jespersen Anna Bothe
Jongman Allard
Jørgensen Henrik
Karmeli Sophia
Kizach Johannes
Kluge Denise Cristina
Lee Goun
Li Bin
Li Yingjie
Masapollo Matthew
Mooshammer Christine
Mora Joan C.
Mora-Plaza Ingrid
Niebuhr Oliver
Nyvad Anne Mette
Nyvad Anne Mette
Piske Thorsten
Polka Linda
Rasmussen Sidsel
Ruan Yufang
Sereno Joan A.
Steinlen Anja
Sørensen Mette Hjortshøj
Sørensen Mette Hjortshøj
Tyler Michael
Vayra Mario
Vikner Sten
Wang Yue
Wayland Ratree
Whalen D. H.
Wood Johanna
Yan Mengzhu
Publication venue: 'Aarhus University Library'
Publication date: 16/05/2019
Field of study

The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters

AU Library Scholarly Publishing Services: E-books (Aarhus University)

Phonetic detail in the developing lexicon

Author: Swingley D.
Publication venue
Publication date: 01/01/2003
Field of study

Although infants show remarkable sensitivity to linguistically relevant phonetic variation in speech, young children sometimes appear not to make use of this sensitivity. Here, children's knowledge of the sound-forms of familiar words was assessed using a visual fixation task. Dutch 19-month-olds were shown pairs of pictures and heard correct pronunciations and mispronunciations of familiar words naming one of the pictures. Mispronunciations were word-initial in Experiment 1 and word-medial in Experiment 2, and in both experiments involved substituting one segment with [d] (a common sound in Dutch) or [g] (a rare sound). In both experiments, word recognition performance was better for correct pronunciations than for mispronunciations involving either substituted consonant. These effects did not depend upon children's knowledge of lexical or nonlexical phonological neighbors of the tested words. The results indicate the encoding of phonetic detail in words at 19 months

MPG.PuRe

Adult cochlear implant users versus typical hearing persons: an automatic analysis of acoustic–prosodic pparameters

Author: Arias-Vergara Tomás
Batliner Anton
Högerle Catalina
Müller Joachim
Nöth Elmar
Orozco-Arroyave Juan-Rafael
Polterauer Daniel
Rader Tobias
Schuster Maria
Publication venue: 'American Speech Language Hearing Association'
Publication date: 01/01/2022
Field of study

Purpose: The aim of this study was to investigate the speech prosody of postlingually deaf cochlear implant (CI) users compared with control speakers without hearing or speech impairment. Method: Speech recordings of 74 CI users (37 males and 37 females) and 72 age-balanced control speakers (36 males and 36 females) are considered. All participants are German native speakers and read Der Nordwind und die Sonne (The North Wind and the Sun), a standard text in pathological speech analysis and phonetic transcriptions. Automatic acoustic analysis is performed considering pitch, loudness, and duration features, including speech rate and rhythm. Results: In general, duration and rhythm features differ between CI users and control speakers. CI users read slower and have a lower voiced segment ratio compared with control speakers. A lower voiced ratio goes along with a prolongation of the voiced segments' duration in male and with a prolongation of pauses in female CI users. Rhythm features in CI users have higher variability in the duration of vowels and consonants than in control speakers. The use of bilateral CIs showed no advantages concerning speech prosody features in comparison to unilateral use of CI. Conclusions: Even after cochlear implantation and rehabilitation, the speech of postlingually deaf adults deviates from the speech of control speakers, which might be due to changed auditory feedback. We suggest considering changes in temporal aspects of speech in future rehabilitation strategies

OPUS Augsburg