398 research outputs found

    Tone classification of syllable -segmented Thai speech based on multilayer perceptron

    Get PDF
    Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%

    Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition

    Get PDF
    The article presents a robust representation of speech based on AR modeling of the causal part of the autocorrelation sequence. In noisy speech recognition, this new representation achieves better results than several other related techniques.Peer ReviewedPostprint (published version

    Generation of prosody and speech for Mandarin Chinese

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Methods and Effects of Shadowing Using Online Authentic Videos on L2 Acquisition of Mandarin Chinese Tones

    Get PDF
    Mandarin Chinese tones are notoriously difficult for second language (L2) learners. Previous research focuses on tone training methods that can help learners produce monosyllabic lexical tones, and studies about the production of multisyllabic lexical tones at the sentence level in spontaneous speech are limited. This study applies shadowing—a method where the learners repeat what they heard with as little delay as possible—to tone training and compares the effects of using authentic videos and textbook audios as shadowing materials for beginner L2 Mandarin learners’ tone improvement at the sentence level. Fourteen students in elementary Chinese classes at an American university participated in the tone training activity for four weeks. The participants in the “authentic video” group received authentic videos as their training materials, while the “textbook audio” group was trained with textbook audios. The participants shadowed the materials twice a week, six times per session, at home in their free time. Tone accuracy was rated by Mandarin native speakers according to the pre-test and the post-test consisting of a read-aloud task and a one-on-one conversation. Qualitative and quantitative surveys were conducted to analyze learners’ attitudes toward the shadowing activity and the materials. The results indicate that learners in both groups showed significant improvements in their accuracy in spontaneous speech with no significant differences between the two groups. As for learners’ attitudes, although the participants reported overall positive feedback on the shadowing activity regardless of the materials, authentic materials generated great interest from the participants and were more appealing to the learners. A strong correlation between learners’ confidence in speaking and flexibility of the activity was also found. Based on the finding, pedagogical implications are discussed, including how to select suitable materials and shadowing instructions. For example, educators could introduce textbook audios first and gradually add authentic materials. The findings provide Mandarin Chinese instructors an effective and engaging way to improve learners’ tone production in spontaneous speaking. Incorporating shadowing activities into class has great potential to encourage learners’ autonomy without occupying precious class time. The findings not only contribute to research on teaching Chinese as a second language and the related pedagogy but also shed light on the use of authentic materials in second language teaching and learning

    Sequential grouping constraints on across-channel auditory processing

    Get PDF

    Fundamental frequency modelling: an articulatory perspective with target approximation and deep learning

    Get PDF
    Current statistical parametric speech synthesis (SPSS) approaches typically aim at state/frame-level acoustic modelling, which leads to a problem of frame-by-frame independence. Besides that, whichever learning technique is used, hidden Markov model (HMM), deep neural network (DNN) or recurrent neural network (RNN), the fundamental idea is to set up a direct mapping from linguistic to acoustic features. Although progress is frequently reported, this idea is questionable in terms of biological plausibility. This thesis aims at addressing the above issues by integrating dynamic mechanisms of human speech production as a core component of F0 generation and thus developing a more human-like F0 modelling paradigm. By introducing an articulatory F0 generation model – target approximation (TA) – between text and speech that controls syllable-synchronised F0 generation, contextual F0 variations are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. With the goal of demonstrating that human speech movement can be considered as a dynamic process of target approximation and that the TA model is a valid F0 generation model to be used at the motor-to-acoustic stage, a TA-based pitch control experiment is conducted first to simulate the subtle human behaviour of online compensation for pitch-shifted auditory feedback. Then, the TA parameters are collectively controlled by linguistic features via a deep or recurrent neural network (DNN/RNN) at the linguistic-to-motor stage. We trained the systems on a Mandarin Chinese dataset consisting of both statements and questions. The TA-based systems generally outperformed the baseline systems in both objective and subjective evaluations. Furthermore, the amount of required linguistic features were reduced first to syllable level only (with DNN) and then with all positional information removed (with RNN). Fewer linguistic features as input with limited number of TA parameters as output led to less training data and lower model complexity, which in turn led to more efficient training and faster synthesis

    Teaching Chinese as a Foreign Language to English Speaking Language Learners: Teachers’ Handbook

    Get PDF
    This project developed a handbook for teachers to assist in the instruction of Chinese as a foreign language. The handbook provides teachers with practical lessons for teaching Chinese to adult beginning language learners. The handbook is based on autoethnographic analyses of my own experiences or stories related to foreign language learning and teaching the Chinese language. Lessons topics were developed based on these stories. The handbook put forwards 6 lesson plans corresponding to 6 specific topics. The handbook is supported by 2 theories: the audio-lingual and communicative foreign language teaching approaches. Based on these 2 teaching approaches, the main idea embedded in the handbook is that teaching spoken language before teaching Chinese writing and grammar rules can help adult novices to learn Chinese more effectively and apply the language in practical situations. Thus, the lesson plans in the handbook are designed to develop the speaking skills of adult learners for communicative purposes. Unlike many current Chinese teaching materials in which spoken and written Chinese are taught together, this handbook creates an innovative teaching method that emphasizes spoken-Chinese language learning for beginner learners. The lesson plans, as examples, are expected to inspire more Chinese teachers to explore and promote innovative teaching lessons and methods

    Analysis on Using Synthesized Singing Techniques in Assistive Interfaces for Visually Impaired to Study Music

    Get PDF
    Tactile and auditory senses are the basic types of methods that visually impaired people sense the world. Their interaction with assistive technologies also focuses mainly on tactile and auditory interfaces. This research paper discuss about the validity of using most appropriate singing synthesizing techniques as a mediator in assistive technologies specifically built to address their music learning needs engaged with music scores and lyrics. Music scores with notations and lyrics are considered as the main mediators in musical communication channel which lies between a composer and a performer. Visually impaired music lovers have less opportunity to access this main mediator since most of them are in visual format. If we consider a music score, the vocal performer’s melody is married to all the pleasant sound producible in the form of singing. Singing best fits for a format in temporal domain compared to a tactile format in spatial domain. Therefore, conversion of existing visual format to a singing output will be the most appropriate nonlossy transition as proved by the initial research on adaptive music score trainer for visually impaired [1]. In order to extend the paths of this initial research, this study seek on existing singing synthesizing techniques and researches on auditory interfaces
    corecore