1,645 research outputs found
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
An integrated theory of language production and comprehension
Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal
CAPTė„¼ ģķ ė°ģ ė³ģ“ ė¶ģ ė° CycleGAN źø°ė° ķ¼ėė°± ģģ±
ķģė
¼ė¬ø(ė°ģ¬)--ģģøėķźµ ėķģ :ģøė¬øėķ ķėź³¼ģ ģøģ§ź³¼ķģ ź³µ,2020. 2. ģ ėƼķ.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies.
This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system.
The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.ģøźµģ“ė”ģģ ķźµģ“ źµģ”ģ ėķ ź“ģ¬ģ“ ź³ ģ”°ėģ“ ķźµģ“ ķģµģģ ģź° ķ¬ź² ģ¦ź°ķź³ ģģ¼ė©°, ģģ±ģøģ“ģ²ė¦¬ źø°ģ ģ ģ ģ©ķ ģ»“ķØķ° źø°ė° ė°ģ źµģ”(Computer-Assisted Pronunciation Training; CAPT) ģ“ķė¦¬ģ¼ģ“ģ
ģ ėķ ģ°źµ¬ ėķ ģ ź·¹ģ ģ¼ė” ģ“ė£Øģ“ģ§ź³ ģė¤. ź·øė¼ģė ė¶źµ¬ķź³ ķģ”“ķė ķźµģ“ ė§ķźø° źµģ” ģģ¤ķ
ģ ģøźµģøģ ķźµģ“ģ ėķ ģøģ“ķģ ķ¹ģ§ģ ģ¶©ė¶ķ ķģ©ķģ§ ģź³ ģģ¼ė©°, ģµģ ģøģ“ģ²ė¦¬ źø°ģ ėķ ģ ģ©ėģ§ ģź³ ģė ģ¤ģ ģ“ė¤. ź°ė„ķ ģģøģ¼ė”ģØė ģøźµģø ė°ķ ķźµģ“ ķģģ ėķ ė¶ģģ“ ģ¶©ė¶ķź² ģ“ė£Øģ“ģ§ģ§ ģģė¤ė ģ , ź·øė¦¬ź³ ź“ė Ø ģ°źµ¬ź° ģģ“ė ģ“ė„¼ ģėķė ģģ¤ķ
ģ ė°ģķźø°ģė ź³ ėķė ģ°źµ¬ź° ķģķė¤ė ģ ģ“ ģė¤. ėæė§ ģėė¼ CAPT źø°ģ ģ ė°ģ ģ¼ė”ė ģ ķøģ²ė¦¬, ģ“ģØ ė¶ģ, ģģ°ģ“ģ²ė¦¬ źø°ė²ź³¼ ź°ģ ķ¹ģ§ ģ¶ģ¶ģ ģģ”“ķź³ ģģ“ģ ģ ķ©ķ ķ¹ģ§ģ ģ°¾ź³ ģ“ė„¼ ģ ķķź² ģ¶ģ¶ķė ė°ģ ė§ģ ģź°ź³¼ ė
øė „ģ“ ķģķ ģ¤ģ ģ“ė¤. ģ“ė ģµģ ė„ė¬ė źø°ė° ģøģ“ģ²ė¦¬ źø°ģ ģ ķģ©ķØģ¼ė”ģØ ģ“ ź³¼ģ ėķ ė°ģ ģ ģ¬ģ§ź° ė§ė¤ė ė°ė„¼ ģģ¬ķė¤.
ė°ė¼ģ ė³ø ģ°źµ¬ė ėؼģ CAPT ģģ¤ķ
ź°ė°ģ ģģ“ ė°ģ ė³ģ“ ģģź³¼ ģøģ“ķģ ģź“ź“ź³ė„¼ ė¶ģķģė¤. ģøźµģø ķģė¤ģ ėė
ģ²“ ė³ģ“ ģģź³¼ ķźµģ“ ģģ“ėƼ ķģė¤ģ ėė
ģ²“ ė³ģ“ ģģģ ėģ”°ķź³ ģ£¼ģķ ė³ģ“ė„¼ ķģøķ ķ, ģź“ź“ź³ ė¶ģģ ķµķģ¬ ģģ¬ģķµģ ģķ„ģ ėÆøģ¹ė ģ¤ģėė„¼ ķģ
ķģė¤. ź·ø ź²°ź³¼, ģ¢
ģ± ģģ ģ 3ģ¤ ėė¦½ģ ķ¼ė, ģ“ė¶ģ ź“ė Ø ģ¤ė„ź° ė°ģķ ź²½ģ° ķ¼ėė°± ģģ±ģ ģ°ģ ģ ģ¼ė” ė°ģķė ź²ģ“ ķģķė¤ė ź²ģ“ ķģøėģė¤.
źµģ ė ķ¼ėė°±ģ ģėģ¼ė” ģģ±ķė ź²ģ CAPT ģģ¤ķ
ģ ģ¤ģķ ź³¼ģ ģ¤ ķėģ“ė¤. ė³ø ģ°źµ¬ė ģ“ ź³¼ģ ź° ė°ķģ ģ¤ķģ¼ ė³ķģ ė¬øģ ė” ķ“ģģ“ ź°ė„ķė¤ź³ ė³“ģģ¼ė©°, ģģ±ģ ģ ė ģ ź²½ė§ (Cycle-consistent Generative Adversarial Network; CycleGAN) źµ¬ģ”°ģģ ėŖØėøė§ķė ź²ģ ģ ģķģė¤. GAN ė¤ķøģķ¬ģ ģģ±ėŖØėøģ ė¹ģģ“ėƼ ė°ķģ ė¶ķ¬ģ ģģ“ėƼ ė°ķ ė¶ķ¬ģ ė§¤ķģ ķģµķė©°, Cycle consistency ģģ¤ķØģė„¼ ģ¬ģ©ķØģ¼ė”ģØ ė°ķź° ģ ė°ģ ģø źµ¬ģ”°ė„¼ ģ ģ§ķØź³¼ ėģģ ź³¼ėķ źµģ ģ ė°©ģ§ķģė¤. ė³ėģ ķ¹ģ§ ģ¶ģ¶ ź³¼ģ ģ“ ģģ“ ķģķ ķ¹ģ§ė¤ģ“ CycleGAN ķė ģģķ¬ģģ ė¬“ź°ė
ė°©ė²ģ¼ė” ģ¤ģ¤ė” ķģµėė ė°©ė²ģ¼ė”, ģøģ“ ķģ„ģ“ ģ©ģ“ķ ė°©ė²ģ“ė¤.
ģøģ“ķģ ė¶ģģģ ėė¬ė ģ£¼ģķ ė³ģ“ė¤ ź°ģ ģ°ģ ģģė Auxiliary Classifier CycleGAN źµ¬ģ”°ģģ ėŖØėøė§ķė ź²ģ ģ ģķģė¤. ģ“ ė°©ė²ģ źø°ģ”“ģ CycleGANģ ģ§ģģ ģ ėŖ©ģģ¼ ķ¼ėė°± ģģ±ģ ģģ±ķØź³¼ ėģģ ķ“ė¹ ķ¼ėė°±ģ“ ģ“ė¤ ģ ķģ ģ¤ė„ģøģ§ ė¶ė„ķė ė¬øģ ė„¼ ģķķė¤. ģ“ė ėė©ģø ģ§ģģ“ źµģ ķ¼ėė°± ģģ± ėØź³ź¹ģ§ ģ ģ§ėź³ ķµģ ź° ź°ė„ķė¤ė ģ„ģ ģ“ ģė¤ė ė°ģ ź·ø ģģź° ģė¤.
ė³ø ģ°źµ¬ģģ ģ ģķ ė°©ė²ģ ķź°ķźø° ģķ“ģ 27ź°ģ ėŖØźµģ“ė„¼ ź°ė 217ėŖ
ģ ģ ģėÆø ģ“ķ ė°ķ 65,100ź°ė” ķ¼ėė°± ģė ģģ± ėŖØėøģ ķė Øķź³ , ź°ģ ģ¬ė¶ ė° ģ ėģ ėķ ģ§ź° ķź°ė„¼ ģķķģė¤. ģ ģė ė°©ė²ģ ģ¬ģ©ķģģ ė ķģµģ ė³øģøģ ėŖ©ģė¦¬ė„¼ ģ ģ§ķ ģ± źµģ ė ė°ģģ¼ė” ė³ķķė ź²ģ“ ź°ė„ķė©°, ģ ķµģ ģø ė°©ė²ģø ģėģ“ ėźø°ģ ģ¤ģ²©ź°ģ° (Pitch-Synchronous Overlap-and-Add) ģź³ ė¦¬ģ¦ģ ģ¬ģ©ķė ė°©ė²ģ ė¹ķ“ ģė ź°ģ ė„ 16.67%ģ“ ķģøėģė¤.Chapter 1. Introduction 1
1.1. Motivation 1
1.1.1. An Overview of CAPT Systems 3
1.1.2. Survey of existing Korean CAPT Systems 5
1.2. Problem Statement 7
1.3. Thesis Structure 7
Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9
2.1. Comparison between Korean and Chinese 11
2.1.1. Phonetic and Syllable Structure Comparisons 11
2.1.2. Phonological Comparisons 14
2.2. Related Works 16
2.3. Proposed Analysis Method 19
2.3.1. Corpus 19
2.3.2. Transcribers and Agreement Rates 22
2.4. Salient Pronunciation Variations 22
2.4.1. Segmental Variation Patterns 22
2.4.1.1. Discussions 25
2.4.2. Phonological Variation Patterns 26
2.4.1.2. Discussions 27
2.5. Summary 29
Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30
3.1. Related Works 31
3.1.1. Criteria used in L2 Speech 31
3.1.2. Criteria used in L2 Korean Speech 32
3.2. Proposed Human Evaluation Method 36
3.2.1. Reading Prompt Design 36
3.2.2. Evaluation Criteria Design 37
3.2.3. Raters and Agreement Rates 40
3.3. Linguistic Factors Affecting L2 Korean Accentedness 41
3.3.1. Pearsons Correlation Analysis 41
3.3.2. Discussions 42
3.3.3. Implications for Automatic Feedback Generation 44
3.4. Summary 45
Chapter 4. Corrective Feedback Generation for CAPT 46
4.1. Related Works 46
4.1.1. Prosody Transplantation 47
4.1.2. Recent Speech Conversion Methods 49
4.1.3. Evaluation of Corrective Feedback 50
4.2. Proposed Method: Corrective Feedback as a Style Transfer 51
4.2.1. Speech Analysis at Spectral Domain 53
4.2.2. Self-imitative Learning 55
4.2.3. An Analogy: CAPT System and GAN Architecture 57
4.3. Generative Adversarial Networks 59
4.3.1. Conditional GAN 61
4.3.2. CycleGAN 62
4.4. Experiment 63
4.4.1. Corpus 64
4.4.2. Baseline Implementation 65
4.4.3. Adversarial Training Implementation 65
4.4.4. Spectrogram-to-Spectrogram Training 66
4.5. Results and Evaluation 69
4.5.1. Spectrogram Generation Results 69
4.5.2. Perceptual Evaluation 70
4.5.3. Discussions 72
4.6. Summary 74
Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75
5.1. Linguistic Class Selection 75
5.2. Auxiliary Classifier CycleGAN Design 77
5.3. Experiment and Results 80
5.3.1. Corpus 80
5.3.2. Feature Annotations 81
5.3.3. Experiment Setup 81
5.3.4. Results 82
5.4. Summary 84
Chapter 6. Conclusion 86
6.1. Thesis Results 86
6.2. Thesis Contributions 88
6.3. Recommendations for Future Work 89
Bibliography 91
Appendix 107
Abstract in Korean 117
Acknowledgments 120Docto
Towards a complete multiple-mechanism account of predictive language processing [Commentary on Pickering & Garrod]
Although we agree with Pickering & Garrod (P&G) that prediction-by-simulation and prediction-by-association are important mechanisms of anticipatory language processing, this commentary suggests that they: (1) overlook other potential mechanisms that might underlie prediction in language processing, (2) overestimate the importance of prediction-by-association in early childhood, and (3) underestimate the complexity and significance of several factors that might mediate prediction during language processing
Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed
The motor theory of speech perception holds that we perceive the speech of
another in terms of a motor representation of that speech. However, when we
have learned to recognize a foreign accent, it seems plausible that recognition
of a word rarely involves reconstruction of the speech gestures of the speaker
rather than the listener. To better assess the motor theory and this
observation, we proceed in three stages. Part 1 places the motor theory of
speech perception in a larger framework based on our earlier models of the
adaptive formation of mirror neurons for grasping, and for viewing extensions
of that mirror system as part of a larger system for neuro-linguistic
processing, augmented by the present consideration of recognizing speech in a
novel accent. Part 2 then offers a novel computational model of how a listener
comes to understand the speech of someone speaking the listener's native
language with a foreign accent. The core tenet of the model is that the
listener uses hypotheses about the word the speaker is currently uttering to
update probabilities linking the sound produced by the speaker to phonemes in
the native language repertoire of the listener. This, on average, improves the
recognition of later words. This model is neutral regarding the nature of the
representations it uses (motor vs. auditory). It serve as a reference point for
the discussion in Part 3, which proposes a dual-stream neuro-linguistic
architecture to revisits claims for and against the motor theory of speech
perception and the relevance of mirror neurons, and extracts some implications
for the reframing of the motor theory
Innovative technologies for under-resourced language documentation: The BULB Project
International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated softwareāLIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping
Innovative technologies for under-resourced language documentation: The BULB Project
International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated softwareāLIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping
The listening talker: A review of human and algorithmic context-induced modifications of speech
International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output
Automatic Pronunciation Assessment -- A Review
Pronunciation assessment and its application in computer-aided pronunciation
training (CAPT) have seen impressive progress in recent years. With the rapid
growth in language processing and deep learning over the past few years, there
is a need for an updated review. In this paper, we review methods employed in
pronunciation assessment for both phonemic and prosodic. We categorize the main
challenges observed in prominent research trends, and highlight existing
limitations, and available resources. This is followed by a discussion of the
remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding
- ā¦