179 research outputs found

    Class-Level Spectral Features for Emotion Recognition

    Get PDF
    The most common approaches to automatic emotion recognition rely on utterance-level prosodic features. Recent studies have shown that utterance-level statistics of segmental spectral features also contain rich information about expressivity and emotion. In our work we introduce a more fine-grained yet robust set of spectral features: statistics of Mel-Frequency Cepstral Coefficients computed over three phoneme type classes of interest – stressed vowels, unstressed vowels and consonants in the utterance. We investigate performance of our features in the task of speaker-independent emotion recognition using two publicly available datasets. Our experimental results clearly indicate that indeed both the richer set of spectral features and the differentiation between phoneme type classes are beneficial for the task. Classification accuracies are consistently higher for our features compared to prosodic or utterance-level spectral features. Combination of our phoneme class features with prosodic features leads to even further improvement. Given the large number of class-level spectral features, we expected feature selection will improve results even further, but none of several selection methods led to clear gains. Further analyses reveal that spectral features computed from consonant regions of the utterance contain more information about emotion than either stressed or unstressed vowel features. We also explore how emotion recognition accuracy depends on utterance length. We show that, while there is no significant dependence for utterance-level prosodic features, accuracy of emotion recognition using class-level spectral features increases with the utterance length

    Dealing with linguistic mismatches for automatic speech recognition

    Get PDF
    Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) on par with human transcribers on the English Switchboard benchmark. However, dealing with linguistic mismatches between the training and testing data is still a significant challenge that remains unsolved. Under the monolingual environment, it is well-known that the performance of ASR systems degrades significantly when presented with the speech from speakers with different accents, dialects, and speaking styles than those encountered during system training. Under the multi-lingual environment, ASR systems trained on a source language achieve even worse performance when tested on another target language because of mismatches in terms of the number of phonemes, lexical ambiguity, and power of phonotactic constraints provided by phone-level n-grams. In order to address the issues of linguistic mismatches for current ASR systems, my dissertation investigates both knowledge-gnostic and knowledge-agnostic solutions. In the first part, classic theories relevant to acoustics and articulatory phonetics that present capability of being transferred across a dialect continuum from local dialects to another standardized language are re-visited. Experiments demonstrate the potentials that acoustic correlates in the vicinity of landmarks could help to build a bridge for dealing with mismatches across difference local or global varieties in a dialect continuum. In the second part, we design an end-to-end acoustic modeling approach based on connectionist temporal classification loss and propose to link the training of acoustics and accent altogether in a manner similar to the learning process in human speech perception. This joint model not only performed well on ASR with multiple accents but also boosted accuracies of accent identification task in comparison to separately-trained models

    CAPTλ₯Ό μœ„ν•œ 발음 변이 뢄석 및 CycleGAN 기반 ν”Όλ“œλ°± 생성

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :μΈλ¬ΈλŒ€ν•™ ν˜‘λ™κ³Όμ • 인지과학전곡,2020. 2. μ •λ―Όν™”.Despite the growing popularity in learning Korean as a foreign language and the rapid development in language learning applications, the existing computer-assisted pronunciation training (CAPT) systems in Korean do not utilize linguistic characteristics of non-native Korean speech. Pronunciation variations in non-native speech are far more diverse than those observed in native speech, which may pose a difficulty in combining such knowledge in an automatic system. Moreover, most of the existing methods rely on feature extraction results from signal processing, prosodic analysis, and natural language processing techniques. Such methods entail limitations since they necessarily depend on finding the right features for the task and the extraction accuracies. This thesis presents a new approach for corrective feedback generation in a CAPT system, in which pronunciation variation patterns and linguistic correlates with accentedness are analyzed and combined with a deep neural network approach, so that feature engineering efforts are minimized while maintaining the linguistically important factors for the corrective feedback generation task. Investigations on non-native Korean speech characteristics in contrast with those of native speakers, and their correlation with accentedness judgement show that both segmental and prosodic variations are important factors in a Korean CAPT system. The present thesis argues that the feedback generation task can be interpreted as a style transfer problem, and proposes to evaluate the idea using generative adversarial network. A corrective feedback generation model is trained on 65,100 read utterances by 217 non-native speakers of 27 mother tongue backgrounds. The features are automatically learnt in an unsupervised way in an auxiliary classifier CycleGAN setting, in which the generator learns to map a foreign accented speech to native speech distributions. In order to inject linguistic knowledge into the network, an auxiliary classifier is trained so that the feedback also identifies the linguistic error types that were defined in the first half of the thesis. The proposed approach generates a corrected version the speech using the learners own voice, outperforming the conventional Pitch-Synchronous Overlap-and-Add method.μ™Έκ΅­μ–΄λ‘œμ„œμ˜ ν•œκ΅­μ–΄ κ΅μœ‘μ— λŒ€ν•œ 관심이 κ³ μ‘°λ˜μ–΄ ν•œκ΅­μ–΄ ν•™μŠ΅μžμ˜ μˆ˜κ°€ 크게 μ¦κ°€ν•˜κ³  있으며, μŒμ„±μ–Έμ–΄μ²˜λ¦¬ κΈ°μˆ μ„ μ μš©ν•œ 컴퓨터 기반 발음 ꡐ윑(Computer-Assisted Pronunciation Training; CAPT) μ–΄ν”Œλ¦¬μΌ€μ΄μ…˜μ— λŒ€ν•œ 연ꡬ λ˜ν•œ 적극적으둜 이루어지고 μžˆλ‹€. κ·ΈλŸΌμ—λ„ λΆˆκ΅¬ν•˜κ³  ν˜„μ‘΄ν•˜λŠ” ν•œκ΅­μ–΄ λ§ν•˜κΈ° ꡐ윑 μ‹œμŠ€ν…œμ€ μ™Έκ΅­μΈμ˜ ν•œκ΅­μ–΄μ— λŒ€ν•œ 언어학적 νŠΉμ§•μ„ μΆ©λΆ„νžˆ ν™œμš©ν•˜μ§€ μ•Šκ³  있으며, μ΅œμ‹  μ–Έμ–΄μ²˜λ¦¬ 기술 λ˜ν•œ μ μš©λ˜μ§€ μ•Šκ³  μžˆλŠ” 싀정이닀. κ°€λŠ₯ν•œ μ›μΈμœΌλ‘œμ¨λŠ” 외ꡭ인 λ°œν™” ν•œκ΅­μ–΄ ν˜„μƒμ— λŒ€ν•œ 뢄석이 μΆ©λΆ„ν•˜κ²Œ 이루어지지 μ•Šμ•˜λ‹€λŠ” 점, 그리고 κ΄€λ ¨ 연ꡬ가 μžˆμ–΄λ„ 이λ₯Ό μžλ™ν™”λœ μ‹œμŠ€ν…œμ— λ°˜μ˜ν•˜κΈ°μ—λŠ” κ³ λ„ν™”λœ 연ꡬ가 ν•„μš”ν•˜λ‹€λŠ” 점이 μžˆλ‹€. 뿐만 μ•„λ‹ˆλΌ CAPT 기술 μ „λ°˜μ μœΌλ‘œλŠ” μ‹ ν˜Έμ²˜λ¦¬, 운율 뢄석, μžμ—°μ–΄μ²˜λ¦¬ 기법과 같은 νŠΉμ§• μΆ”μΆœμ— μ˜μ‘΄ν•˜κ³  μžˆμ–΄μ„œ μ ν•©ν•œ νŠΉμ§•μ„ μ°Ύκ³  이λ₯Ό μ •ν™•ν•˜κ²Œ μΆ”μΆœν•˜λŠ” 데에 λ§Žμ€ μ‹œκ°„κ³Ό λ…Έλ ₯이 ν•„μš”ν•œ 싀정이닀. μ΄λŠ” μ΅œμ‹  λ”₯λŸ¬λ‹ 기반 μ–Έμ–΄μ²˜λ¦¬ κΈ°μˆ μ„ ν™œμš©ν•¨μœΌλ‘œμ¨ 이 κ³Όμ • λ˜ν•œ λ°œμ „μ˜ 여지가 λ§Žλ‹€λŠ” λ°”λ₯Ό μ‹œμ‚¬ν•œλ‹€. λ”°λΌμ„œ λ³Έ μ—°κ΅¬λŠ” λ¨Όμ € CAPT μ‹œμŠ€ν…œ κ°œλ°œμ— μžˆμ–΄ 발음 변이 양상과 언어학적 상관관계λ₯Ό λΆ„μ„ν•˜μ˜€λ‹€. 외ꡭ인 ν™”μžλ“€μ˜ 낭독체 변이 양상과 ν•œκ΅­μ–΄ 원어민 ν™”μžλ“€μ˜ 낭독체 변이 양상을 λŒ€μ‘°ν•˜κ³  μ£Όμš”ν•œ 변이λ₯Ό ν™•μΈν•œ ν›„, 상관관계 뢄석을 ν†΅ν•˜μ—¬ μ˜μ‚¬μ†Œν†΅μ— 영ν–₯을 λ―ΈμΉ˜λŠ” μ€‘μš”λ„λ₯Ό νŒŒμ•…ν•˜μ˜€λ‹€. κ·Έ κ²°κ³Ό, μ’…μ„± μ‚­μ œμ™€ 3쀑 λŒ€λ¦½μ˜ ν˜Όλ™, μ΄ˆλΆ„μ ˆ κ΄€λ ¨ 였λ₯˜κ°€ λ°œμƒν•  경우 ν”Όλ“œλ°± 생성에 μš°μ„ μ μœΌλ‘œ λ°˜μ˜ν•˜λŠ” 것이 ν•„μš”ν•˜λ‹€λŠ” 것이 ν™•μΈλ˜μ—ˆλ‹€. κ΅μ •λœ ν”Όλ“œλ°±μ„ μžλ™μœΌλ‘œ μƒμ„±ν•˜λŠ” 것은 CAPT μ‹œμŠ€ν…œμ˜ μ€‘μš”ν•œ 과제 쀑 ν•˜λ‚˜μ΄λ‹€. λ³Έ μ—°κ΅¬λŠ” 이 κ³Όμ œκ°€ λ°œν™”μ˜ μŠ€νƒ€μΌ λ³€ν™”μ˜ 문제둜 해석이 κ°€λŠ₯ν•˜λ‹€κ³  λ³΄μ•˜μœΌλ©°, 생성적 μ λŒ€ 신경망 (Cycle-consistent Generative Adversarial Network; CycleGAN) κ΅¬μ‘°μ—μ„œ λͺ¨λΈλ§ν•˜λŠ” 것을 μ œμ•ˆν•˜μ˜€λ‹€. GAN λ„€νŠΈμ›Œν¬μ˜ 생성λͺ¨λΈμ€ 비원어민 λ°œν™”μ˜ 뢄포와 원어민 λ°œν™” λΆ„ν¬μ˜ 맀핑을 ν•™μŠ΅ν•˜λ©°, Cycle consistency μ†μ‹€ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•¨μœΌλ‘œμ¨ λ°œν™”κ°„ μ „λ°˜μ μΈ ꡬ쑰λ₯Ό μœ μ§€ν•¨κ³Ό λ™μ‹œμ— κ³Όλ„ν•œ ꡐ정을 λ°©μ§€ν•˜μ˜€λ‹€. λ³„λ„μ˜ νŠΉμ§• μΆ”μΆœ 과정이 없이 ν•„μš”ν•œ νŠΉμ§•λ“€μ΄ CycleGAN ν”„λ ˆμž„μ›Œν¬μ—μ„œ 무감독 λ°©λ²•μœΌλ‘œ 슀슀둜 ν•™μŠ΅λ˜λŠ” λ°©λ²•μœΌλ‘œ, μ–Έμ–΄ ν™•μž₯이 μš©μ΄ν•œ 방법이닀. 언어학적 λΆ„μ„μ—μ„œ λ“œλŸ¬λ‚œ μ£Όμš”ν•œ 변이듀 κ°„μ˜ μš°μ„ μˆœμœ„λŠ” Auxiliary Classifier CycleGAN κ΅¬μ‘°μ—μ„œ λͺ¨λΈλ§ν•˜λŠ” 것을 μ œμ•ˆν•˜μ˜€λ‹€. 이 방법은 기쑴의 CycleGAN에 지식을 μ ‘λͺ©μ‹œμΌœ ν”Όλ“œλ°± μŒμ„±μ„ 생성함과 λ™μ‹œμ— ν•΄λ‹Ή ν”Όλ“œλ°±μ΄ μ–΄λ–€ μœ ν˜•μ˜ 였λ₯˜μΈμ§€ λΆ„λ₯˜ν•˜λŠ” 문제λ₯Ό μˆ˜ν–‰ν•œλ‹€. μ΄λŠ” 도메인 지식이 ꡐ정 ν”Όλ“œλ°± 생성 λ‹¨κ³„κΉŒμ§€ μœ μ§€λ˜κ³  ν†΅μ œκ°€ κ°€λŠ₯ν•˜λ‹€λŠ” μž₯점이 μžˆλ‹€λŠ” 데에 κ·Έ μ˜μ˜κ°€ μžˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œ μ œμ•ˆν•œ 방법을 ν‰κ°€ν•˜κΈ° μœ„ν•΄μ„œ 27개의 λͺ¨κ΅­μ–΄λ₯Ό κ°–λŠ” 217λͺ…μ˜ 유의미 μ–΄νœ˜ λ°œν™” 65,100개둜 ν”Όλ“œλ°± μžλ™ 생성 λͺ¨λΈμ„ ν›ˆλ ¨ν•˜κ³ , κ°œμ„  μ—¬λΆ€ 및 정도에 λŒ€ν•œ 지각 평가λ₯Ό μˆ˜ν–‰ν•˜μ˜€λ‹€. μ œμ•ˆλœ 방법을 μ‚¬μš©ν•˜μ˜€μ„ λ•Œ ν•™μŠ΅μž 본인의 λͺ©μ†Œλ¦¬λ₯Ό μœ μ§€ν•œ 채 κ΅μ •λœ 발음으둜 λ³€ν™˜ν•˜λŠ” 것이 κ°€λŠ₯ν•˜λ©°, 전톡적인 방법인 μŒλ†’μ΄ 동기식 쀑첩가산 (Pitch-Synchronous Overlap-and-Add) μ•Œκ³ λ¦¬μ¦˜μ„ μ‚¬μš©ν•˜λŠ” 방법에 λΉ„ν•΄ μƒλŒ€ κ°œμ„ λ₯  16.67%이 ν™•μΈλ˜μ—ˆλ‹€.Chapter 1. Introduction 1 1.1. Motivation 1 1.1.1. An Overview of CAPT Systems 3 1.1.2. Survey of existing Korean CAPT Systems 5 1.2. Problem Statement 7 1.3. Thesis Structure 7 Chapter 2. Pronunciation Analysis of Korean Produced by Chinese 9 2.1. Comparison between Korean and Chinese 11 2.1.1. Phonetic and Syllable Structure Comparisons 11 2.1.2. Phonological Comparisons 14 2.2. Related Works 16 2.3. Proposed Analysis Method 19 2.3.1. Corpus 19 2.3.2. Transcribers and Agreement Rates 22 2.4. Salient Pronunciation Variations 22 2.4.1. Segmental Variation Patterns 22 2.4.1.1. Discussions 25 2.4.2. Phonological Variation Patterns 26 2.4.1.2. Discussions 27 2.5. Summary 29 Chapter 3. Correlation Analysis of Pronunciation Variations and Human Evaluation 30 3.1. Related Works 31 3.1.1. Criteria used in L2 Speech 31 3.1.2. Criteria used in L2 Korean Speech 32 3.2. Proposed Human Evaluation Method 36 3.2.1. Reading Prompt Design 36 3.2.2. Evaluation Criteria Design 37 3.2.3. Raters and Agreement Rates 40 3.3. Linguistic Factors Affecting L2 Korean Accentedness 41 3.3.1. Pearsons Correlation Analysis 41 3.3.2. Discussions 42 3.3.3. Implications for Automatic Feedback Generation 44 3.4. Summary 45 Chapter 4. Corrective Feedback Generation for CAPT 46 4.1. Related Works 46 4.1.1. Prosody Transplantation 47 4.1.2. Recent Speech Conversion Methods 49 4.1.3. Evaluation of Corrective Feedback 50 4.2. Proposed Method: Corrective Feedback as a Style Transfer 51 4.2.1. Speech Analysis at Spectral Domain 53 4.2.2. Self-imitative Learning 55 4.2.3. An Analogy: CAPT System and GAN Architecture 57 4.3. Generative Adversarial Networks 59 4.3.1. Conditional GAN 61 4.3.2. CycleGAN 62 4.4. Experiment 63 4.4.1. Corpus 64 4.4.2. Baseline Implementation 65 4.4.3. Adversarial Training Implementation 65 4.4.4. Spectrogram-to-Spectrogram Training 66 4.5. Results and Evaluation 69 4.5.1. Spectrogram Generation Results 69 4.5.2. Perceptual Evaluation 70 4.5.3. Discussions 72 4.6. Summary 74 Chapter 5. Integration of Linguistic Knowledge in an Auxiliary Classifier CycleGAN for Feedback Generation 75 5.1. Linguistic Class Selection 75 5.2. Auxiliary Classifier CycleGAN Design 77 5.3. Experiment and Results 80 5.3.1. Corpus 80 5.3.2. Feature Annotations 81 5.3.3. Experiment Setup 81 5.3.4. Results 82 5.4. Summary 84 Chapter 6. Conclusion 86 6.1. Thesis Results 86 6.2. Thesis Contributions 88 6.3. Recommendations for Future Work 89 Bibliography 91 Appendix 107 Abstract in Korean 117 Acknowledgments 120Docto

    A computational model for studying L1’s effect on L2 speech learning

    Get PDF
    abstract: Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. To test the hypotheses, this study comes up with a computational model to analyze the accented speech properties in both segmental (short-term speech measurements on short-segment or phoneme level) and suprasegmental (long-term speech measurements on word, long-segment, or sentence level) feature space. The benefit of using a computational model is that it enables quantitative analysis of L1's effect on accent in terms of different phonological properties. The core parts of this computational model are feature extraction schemes to extract pronunciation and prosody representation of accented speech based on existing techniques in speech processing field. Correlation analysis on both segmental and suprasegmental feature space is conducted to look into the relationship between acoustic measurements related to L1s and perceived accentedness across several L1s. Multiple regression analysis is employed to investigate how the L1's effect impacts the perception of foreign accent, and how accented speech produced by speakers from different L1s behaves distinctly on segmental and suprasegmental feature spaces. Results unveil the potential application of the methodology in this study to provide quantitative analysis of accented speech, and extend current studies in L2 speech learning theory to large scale. Practically, this study further shows that the computational model proposed in this study can benefit automatic accentedness evaluation system by adding features related to speakers' L1s.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Methods for pronunciation assessment in computer aided language learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 149-176).Learning a foreign language is a challenging endeavor that entails acquiring a wide range of new knowledge including words, grammar, gestures, sounds, etc. Mastering these skills all require extensive practice by the learner and opportunities may not always be available. Computer Aided Language Learning (CALL) systems provide non-threatening environments where foreign language skills can be practiced where ever and whenever a student desires. These systems often have several technologies to identify the different types of errors made by a student. This thesis focuses on the problem of identifying mispronunciations made by a foreign language student using a CALL system. We make several assumptions about the nature of the learning activity: it takes place using a dialogue system, it is a task- or game-oriented activity, the student should not be interrupted by the pronunciation feedback system, and that the goal of the feedback system is to identify severe mispronunciations with high reliability. Detecting mispronunciations requires a corpus of speech with human judgements of pronunciation quality. Typical approaches to collecting such a corpus use an expert phonetician to both phonetically transcribe and assign judgements of quality to each phone in a corpus. This is time consuming and expensive. It also places an extra burden on the transcriber. We describe a novel method for obtaining phone level judgements of pronunciation quality by utilizing non-expert, crowd-sourced, word level judgements of pronunciation. Foreign language learners typically exhibit high variation and pronunciation shapes distinct from native speakers that make analysis for mispronunciation difficult. We detail a simple, but effective method for transforming the vowel space of non-native speakers to make mispronunciation detection more robust and accurate. We show that this transformation not only enhances performance on a simple classification task, but also results in distributions that can be better exploited for mispronunciation detection. This transformation of the vowel is exploited to train a mispronunciation detector using a variety of features derived from acoustic model scores and vowel class distributions. We confirm that the transformation technique results in a more robust and accurate identification of mispronunciations than traditional acoustic models.by Mitchell A. Peabody.Ph.D

    Articulatory-WaveNet: Deep Autoregressive Model for Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-Articulatory Inversion, the estimation of articulatory kinematics from speech, is an important problem which has received significant attention in recent years. Estimated articulatory movements from such models can be used for many applications, including speech synthesis, automatic speech recognition, and facial kinematics for talking-head animation devices. Knowledge about the position of the articulators can also be extremely useful in speech therapy systems and Computer-Aided Language Learning (CALL) and Computer-Aided Pronunciation Training (CAPT) systems for second language learners. Acoustic-to-Articulatory Inversion is a challenging problem due to the complexity of articulation patterns and significant inter-speaker differences. This is even more challenging when applied to non-native speakers without any kinematic training data. This dissertation attempts to address these problems through the development of up-graded architectures for Articulatory Inversion. The proposed Articulatory-WaveNet architecture is based on a dilated causal convolutional layer structure that improves the Acoustic-to-Articulatory Inversion estimated results for both speaker-dependent and speaker-independent scenarios. The system has been evaluated on the ElectroMagnetic Articulography corpus of Mandarin Accented English (EMA-MAE) corpus, consisting of 39 speakers including both native English speakers and Mandarin accented English speakers. Results show that Articulatory-WaveNet improves the performance of the speaker-dependent and speaker-independent Acoustic-to-Articulatory Inversion systems significantly compared to the previously reported results

    A computational model of the relationship between speech intelligibility and speech acoustics

    Get PDF
    abstract: Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on a dysarthric speech dataset with various severity degrees. Multiple regression analysis is employed to show the developed measures could predict perceptual ratings reliably. The relationship between the acoustic measures and the listening errors is investigated to show the interaction between speech production and perception. The hypothesize is that the segmental phoneme errors are mainly caused by the imprecise articulation, while the sprasegmental lexical boundary errors are due to the unreliable phonemic information as well as the abnormal rhythm and prosody patterns. To test the hypothesis, within-speaker variations are simulated in different speaking modes. Significant changes have been detected in both the acoustic signals and the listening errors. Results of the regression analysis support the hypothesis by showing that changes in the articulation-related acoustic features are important in predicting changes in listening phoneme errors, while changes in both of the articulation- and prosody-related features are important in predicting changes in lexical boundary errors. Moreover, significant correlation has been achieved in the cross-validation experiment, which indicates that it is possible to predict intelligibility variations from acoustic signal.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis
    • …
    corecore