1,645 research outputs found

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    An integrated theory of language production and comprehension

    Get PDF
    Currently, production and comprehension are regarded as quite distinct in accounts of language processing. In rejecting this dichotomy, we instead assert that producing and understanding are interwoven, and that this interweaving is what enables people to predict themselves and each other. We start by noting that production and comprehension are forms of action and action perception. We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. Specifically, we assume that actors construct forward models of their actions before they execute those actions, and that perceivers of others' actions covertly imitate those actions, then construct forward models of those actions. We use these accounts of action, action perception, and joint action to develop accounts of production, comprehension, and interactive language. Importantly, they incorporate well-defined levels of linguistic representation (such as semantics, syntax, and phonology). We show (a) how speakers and comprehenders use covert imitation and forward modeling to make predictions at these levels of representation, (b) how they interweave production and comprehension processes, and (c) how they use these predictions to monitor the upcoming utterances. We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal

    CAPTė„¼ ģœ„ķ•œ ė°œģŒ ė³€ģ“ ė¶„ģ„ ė° CycleGAN źø°ė°˜ ķ”¼ė“œė°± ģƒģ„±

    Get PDF
    ķ•™ģœ„ė…¼ė¬ø(ė°•ģ‚¬)--ģ„œģšøėŒ€ķ•™źµ ėŒ€ķ•™ģ› :ģøė¬øėŒ€ķ•™ ķ˜‘ė™ź³¼ģ • ģøģ§€ź³¼ķ•™ģ „ź³µ,2020. 2. ģ •ėƼķ™”.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies. This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system. The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.ģ™øźµ­ģ–“ė”œģ„œģ˜ ķ•œźµ­ģ–“ źµģœ”ģ— ėŒ€ķ•œ ź“€ģ‹¬ģ“ ź³ ģ”°ė˜ģ–“ ķ•œźµ­ģ–“ ķ•™ģŠµģžģ˜ ģˆ˜ź°€ ķ¬ź²Œ ģ¦ź°€ķ•˜ź³  ģžˆģœ¼ė©°, ģŒģ„±ģ–øģ–“ģ²˜ė¦¬ źø°ģˆ ģ„ ģ ģš©ķ•œ ģ»“ķ“Øķ„° źø°ė°˜ ė°œģŒ źµģœ”(Computer-Assisted Pronunciation Training; CAPT) ģ–“ķ”Œė¦¬ģ¼€ģ“ģ…˜ģ— ėŒ€ķ•œ ģ—°źµ¬ ė˜ķ•œ ģ ź·¹ģ ģœ¼ė”œ ģ“ė£Øģ–“ģ§€ź³  ģžˆė‹¤. ź·øėŸ¼ģ—ė„ ė¶ˆźµ¬ķ•˜ź³  ķ˜„ģ”“ķ•˜ėŠ” ķ•œźµ­ģ–“ ė§ķ•˜źø° źµģœ” ģ‹œģŠ¤ķ…œģ€ ģ™øźµ­ģøģ˜ ķ•œźµ­ģ–“ģ— ėŒ€ķ•œ ģ–øģ–“ķ•™ģ  ķŠ¹ģ§•ģ„ ģ¶©ė¶„ķžˆ ķ™œģš©ķ•˜ģ§€ ģ•Šź³  ģžˆģœ¼ė©°, ģµœģ‹  ģ–øģ–“ģ²˜ė¦¬ źø°ģˆ  ė˜ķ•œ ģ ģš©ė˜ģ§€ ģ•Šź³  ģžˆėŠ” ģ‹¤ģ •ģ“ė‹¤. ź°€ėŠ„ķ•œ ģ›ģøģœ¼ė”œģØėŠ” ģ™øźµ­ģø ė°œķ™” ķ•œźµ­ģ–“ ķ˜„ģƒģ— ėŒ€ķ•œ ė¶„ģ„ģ“ ģ¶©ė¶„ķ•˜ź²Œ ģ“ė£Øģ–“ģ§€ģ§€ ģ•Šģ•˜ė‹¤ėŠ” ģ , ź·øė¦¬ź³  ź“€ė Ø ģ—°źµ¬ź°€ ģžˆģ–“ė„ ģ“ė„¼ ģžė™ķ™”ėœ ģ‹œģŠ¤ķ…œģ— ė°˜ģ˜ķ•˜źø°ģ—ėŠ” ź³ ė„ķ™”ėœ ģ—°źµ¬ź°€ ķ•„ģš”ķ•˜ė‹¤ėŠ” ģ ģ“ ģžˆė‹¤. ėæė§Œ ģ•„ė‹ˆė¼ CAPT źø°ģˆ  ģ „ė°˜ģ ģœ¼ė”œėŠ” ģ‹ ķ˜øģ²˜ė¦¬, ģš“ģœØ ė¶„ģ„, ģžģ—°ģ–“ģ²˜ė¦¬ źø°ė²•ź³¼ ź°™ģ€ ķŠ¹ģ§• ģ¶”ģ¶œģ— ģ˜ģ”“ķ•˜ź³  ģžˆģ–“ģ„œ ģ ķ•©ķ•œ ķŠ¹ģ§•ģ„ ģ°¾ź³  ģ“ė„¼ ģ •ķ™•ķ•˜ź²Œ ģ¶”ģ¶œķ•˜ėŠ” ė°ģ— ė§Žģ€ ģ‹œź°„ź³¼ ė…øė „ģ“ ķ•„ģš”ķ•œ ģ‹¤ģ •ģ“ė‹¤. ģ“ėŠ” ģµœģ‹  ė”„ėŸ¬ė‹ źø°ė°˜ ģ–øģ–“ģ²˜ė¦¬ źø°ģˆ ģ„ ķ™œģš©ķ•Øģœ¼ė”œģØ ģ“ ź³¼ģ • ė˜ķ•œ ė°œģ „ģ˜ ģ—¬ģ§€ź°€ ė§Žė‹¤ėŠ” ė°”ė„¼ ģ‹œģ‚¬ķ•œė‹¤. ė”°ė¼ģ„œ ė³ø ģ—°źµ¬ėŠ” ėؼģ € CAPT ģ‹œģŠ¤ķ…œ ź°œė°œģ— ģžˆģ–“ ė°œģŒ ė³€ģ“ ģ–‘ģƒź³¼ ģ–øģ–“ķ•™ģ  ģƒź“€ź“€ź³„ė„¼ ė¶„ģ„ķ•˜ģ˜€ė‹¤. ģ™øźµ­ģø ķ™”ģžė“¤ģ˜ ė‚­ė…ģ²“ ė³€ģ“ ģ–‘ģƒź³¼ ķ•œźµ­ģ–“ ģ›ģ–“ėƼ ķ™”ģžė“¤ģ˜ ė‚­ė…ģ²“ ė³€ģ“ ģ–‘ģƒģ„ ėŒ€ģ”°ķ•˜ź³  ģ£¼ģš”ķ•œ ė³€ģ“ė„¼ ķ™•ģøķ•œ ķ›„, ģƒź“€ź“€ź³„ ė¶„ģ„ģ„ ķ†µķ•˜ģ—¬ ģ˜ģ‚¬ģ†Œķ†µģ— ģ˜ķ–„ģ„ ėÆøģ¹˜ėŠ” ģ¤‘ģš”ė„ė„¼ ķŒŒģ•…ķ•˜ģ˜€ė‹¤. ź·ø ź²°ź³¼, ģ¢…ģ„± ģ‚­ģ œģ™€ 3ģ¤‘ ėŒ€ė¦½ģ˜ ķ˜¼ė™, ģ“ˆė¶„ģ ˆ ź“€ė Ø ģ˜¤ė„˜ź°€ ė°œģƒķ•  ź²½ģš° ķ”¼ė“œė°± ģƒģ„±ģ— ģš°ģ„ ģ ģœ¼ė”œ ė°˜ģ˜ķ•˜ėŠ” ź²ƒģ“ ķ•„ģš”ķ•˜ė‹¤ėŠ” ź²ƒģ“ ķ™•ģøė˜ģ—ˆė‹¤. źµģ •ėœ ķ”¼ė“œė°±ģ„ ģžė™ģœ¼ė”œ ģƒģ„±ķ•˜ėŠ” ź²ƒģ€ CAPT ģ‹œģŠ¤ķ…œģ˜ ģ¤‘ģš”ķ•œ ź³¼ģ œ ģ¤‘ ķ•˜ė‚˜ģ“ė‹¤. ė³ø ģ—°źµ¬ėŠ” ģ“ ź³¼ģ œź°€ ė°œķ™”ģ˜ ģŠ¤ķƒ€ģ¼ ė³€ķ™”ģ˜ ė¬øģ œė”œ ķ•“ģ„ģ“ ź°€ėŠ„ķ•˜ė‹¤ź³  ė³“ģ•˜ģœ¼ė©°, ģƒģ„±ģ  ģ ėŒ€ ģ‹ ź²½ė§ (Cycle-consistent Generative Adversarial Network; CycleGAN) źµ¬ģ”°ģ—ģ„œ ėŖØėøė§ķ•˜ėŠ” ź²ƒģ„ ģ œģ•ˆķ•˜ģ˜€ė‹¤. GAN ė„¤ķŠøģ›Œķ¬ģ˜ ģƒģ„±ėŖØėøģ€ ė¹„ģ›ģ–“ėƼ ė°œķ™”ģ˜ ė¶„ķ¬ģ™€ ģ›ģ–“ėƼ ė°œķ™” ė¶„ķ¬ģ˜ ė§¤ķ•‘ģ„ ķ•™ģŠµķ•˜ė©°, Cycle consistency ģ†ģ‹¤ķ•Øģˆ˜ė„¼ ģ‚¬ģš©ķ•Øģœ¼ė”œģØ ė°œķ™”ź°„ ģ „ė°˜ģ ģø źµ¬ģ”°ė„¼ ģœ ģ§€ķ•Øź³¼ ė™ģ‹œģ— ź³¼ė„ķ•œ źµģ •ģ„ ė°©ģ§€ķ•˜ģ˜€ė‹¤. ė³„ė„ģ˜ ķŠ¹ģ§• ģ¶”ģ¶œ ź³¼ģ •ģ“ ģ—†ģ“ ķ•„ģš”ķ•œ ķŠ¹ģ§•ė“¤ģ“ CycleGAN ķ”„ė ˆģž„ģ›Œķ¬ģ—ģ„œ ė¬“ź°ė… ė°©ė²•ģœ¼ė”œ ģŠ¤ģŠ¤ė”œ ķ•™ģŠµė˜ėŠ” ė°©ė²•ģœ¼ė”œ, ģ–øģ–“ ķ™•ģž„ģ“ ģš©ģ“ķ•œ ė°©ė²•ģ“ė‹¤. ģ–øģ–“ķ•™ģ  ė¶„ģ„ģ—ģ„œ ė“œėŸ¬ė‚œ ģ£¼ģš”ķ•œ ė³€ģ“ė“¤ ź°„ģ˜ ģš°ģ„ ģˆœģœ„ėŠ” Auxiliary Classifier CycleGAN źµ¬ģ”°ģ—ģ„œ ėŖØėøė§ķ•˜ėŠ” ź²ƒģ„ ģ œģ•ˆķ•˜ģ˜€ė‹¤. ģ“ ė°©ė²•ģ€ źø°ģ”“ģ˜ CycleGANģ— ģ§€ģ‹ģ„ ģ ‘ėŖ©ģ‹œģ¼œ ķ”¼ė“œė°± ģŒģ„±ģ„ ģƒģ„±ķ•Øź³¼ ė™ģ‹œģ— ķ•“ė‹¹ ķ”¼ė“œė°±ģ“ ģ–“ė–¤ ģœ ķ˜•ģ˜ ģ˜¤ė„˜ģøģ§€ ė¶„ė„˜ķ•˜ėŠ” ė¬øģ œė„¼ ģˆ˜ķ–‰ķ•œė‹¤. ģ“ėŠ” ė„ė©”ģø ģ§€ģ‹ģ“ źµģ • ķ”¼ė“œė°± ģƒģ„± ė‹Øź³„ź¹Œģ§€ ģœ ģ§€ė˜ź³  ķ†µģ œź°€ ź°€ėŠ„ķ•˜ė‹¤ėŠ” ģž„ģ ģ“ ģžˆė‹¤ėŠ” ė°ģ— ź·ø ģ˜ģ˜ź°€ ģžˆė‹¤. ė³ø ģ—°źµ¬ģ—ģ„œ ģ œģ•ˆķ•œ ė°©ė²•ģ„ ķ‰ź°€ķ•˜źø° ģœ„ķ•“ģ„œ 27ź°œģ˜ ėŖØźµ­ģ–“ė„¼ ź°–ėŠ” 217ėŖ…ģ˜ ģœ ģ˜ėÆø ģ–“ķœ˜ ė°œķ™” 65,100ź°œė”œ ķ”¼ė“œė°± ģžė™ ģƒģ„± ėŖØėøģ„ ķ›ˆė Øķ•˜ź³ , ź°œģ„  ģ—¬ė¶€ ė° ģ •ė„ģ— ėŒ€ķ•œ ģ§€ź° ķ‰ź°€ė„¼ ģˆ˜ķ–‰ķ•˜ģ˜€ė‹¤. ģ œģ•ˆėœ ė°©ė²•ģ„ ģ‚¬ģš©ķ•˜ģ˜€ģ„ ė•Œ ķ•™ģŠµģž ė³øģøģ˜ ėŖ©ģ†Œė¦¬ė„¼ ģœ ģ§€ķ•œ ģ±„ źµģ •ėœ ė°œģŒģœ¼ė”œ ė³€ķ™˜ķ•˜ėŠ” ź²ƒģ“ ź°€ėŠ„ķ•˜ė©°, ģ „ķ†µģ ģø ė°©ė²•ģø ģŒė†’ģ“ ė™źø°ģ‹ ģ¤‘ģ²©ź°€ģ‚° (Pitch-Synchronous Overlap-and-Add) ģ•Œź³ ė¦¬ģ¦˜ģ„ ģ‚¬ģš©ķ•˜ėŠ” ė°©ė²•ģ— ė¹„ķ•“ ģƒėŒ€ ź°œģ„ ė„  16.67%ģ“ ķ™•ģøė˜ģ—ˆė‹¤.Chapter 1. Introduction 1 1.1. Motivation 1 1.1.1. An Overview of CAPT Systems 3 1.1.2. Survey of existing Korean CAPT Systems 5 1.2. Problem Statement 7 1.3. Thesis Structure 7 Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9 2.1. Comparison between Korean and Chinese 11 2.1.1. Phonetic and Syllable Structure Comparisons 11 2.1.2. Phonological Comparisons 14 2.2. Related Works 16 2.3. Proposed Analysis Method 19 2.3.1. Corpus 19 2.3.2. Transcribers and Agreement Rates 22 2.4. Salient Pronunciation Variations 22 2.4.1. Segmental Variation Patterns 22 2.4.1.1. Discussions 25 2.4.2. Phonological Variation Patterns 26 2.4.1.2. Discussions 27 2.5. Summary 29 Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30 3.1. Related Works 31 3.1.1. Criteria used in L2 Speech 31 3.1.2. Criteria used in L2 Korean Speech 32 3.2. Proposed Human Evaluation Method 36 3.2.1. Reading Prompt Design 36 3.2.2. Evaluation Criteria Design 37 3.2.3. Raters and Agreement Rates 40 3.3. Linguistic Factors Affecting L2 Korean Accentedness 41 3.3.1. Pearsons Correlation Analysis 41 3.3.2. Discussions 42 3.3.3. Implications for Automatic Feedback Generation 44 3.4. Summary 45 Chapter 4. Corrective Feedback Generation for CAPT 46 4.1. Related Works 46 4.1.1. Prosody Transplantation 47 4.1.2. Recent Speech Conversion Methods 49 4.1.3. Evaluation of Corrective Feedback 50 4.2. Proposed Method: Corrective Feedback as a Style Transfer 51 4.2.1. Speech Analysis at Spectral Domain 53 4.2.2. Self-imitative Learning 55 4.2.3. An Analogy: CAPT System and GAN Architecture 57 4.3. Generative Adversarial Networks 59 4.3.1. Conditional GAN 61 4.3.2. CycleGAN 62 4.4. Experiment 63 4.4.1. Corpus 64 4.4.2. Baseline Implementation 65 4.4.3. Adversarial Training Implementation 65 4.4.4. Spectrogram-to-Spectrogram Training 66 4.5. Results and Evaluation 69 4.5.1. Spectrogram Generation Results 69 4.5.2. Perceptual Evaluation 70 4.5.3. Discussions 72 4.6. Summary 74 Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75 5.1. Linguistic Class Selection 75 5.2. Auxiliary Classifier CycleGAN Design 77 5.3. Experiment and Results 80 5.3.1. Corpus 80 5.3.2. Feature Annotations 81 5.3.3. Experiment Setup 81 5.3.4. Results 82 5.4. Summary 84 Chapter 6. Conclusion 86 6.1. Thesis Results 86 6.2. Thesis Contributions 88 6.3. Recommendations for Future Work 89 Bibliography 91 Appendix 107 Abstract in Korean 117 Acknowledgments 120Docto

    Towards a complete multiple-mechanism account of predictive language processing [Commentary on Pickering & Garrod]

    Get PDF
    Although we agree with Pickering & Garrod (P&G) that prediction-by-simulation and prediction-by-association are important mechanisms of anticipatory language processing, this commentary suggests that they: (1) overlook other potential mechanisms that might underlie prediction in language processing, (2) overestimate the importance of prediction-by-association in early childhood, and (3) underestimate the complexity and significance of several factors that might mediate prediction during language processing

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Innovative technologies for under-resourced language documentation: The BULB Project

    Get PDF
    International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated softwareā€”LIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping

    Innovative technologies for under-resourced language documentation: The BULB Project

    No full text
    International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated softwareā€”LIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Automatic Pronunciation Assessment -- A Review

    Full text link
    Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding
    • ā€¦
    corecore