398 research outputs found
Tone classification of syllable -segmented Thai speech based on multilayer perceptron
Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%
Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition
The article presents a robust representation of speech based on AR modeling of the causal part of the autocorrelation sequence. In noisy speech recognition, this new representation achieves better results than several other related techniques.Peer ReviewedPostprint (published version
Recommended from our members
Knocking tones off their perch: investigating the intelligibility of Anglophone beginner learners of Mandarin Chinese at two secondary schools in the North of England
Knocking tones off their perch: investigating the intelligibility of Anglophone beginner learners of Mandarin Chinese at two secondary schools in the North of England
Robert Neal
Abstract
Set within the context of teaching and learning Chinese at two secondary schools in the North of England and adopting a case study research design, the aim of this PhD study is to explore the intelligibility of young Anglophone beginner learners of Chinese in order to make a contribution towards the creation of a more evidence-informed Chinese as a Second Language (CSL) pedagogy.
Data collection activities included recording the spoken Chinese of 20 L2 learners during a variety of speaking tasks – from reading aloud single words and sentences to speaking extemporaneously in role plays. 40 L1 raters were subsequently interviewed as they tried to comprehend the learners’ randomised speech samples. I also made use of stimulated recall interviews in which learners listened to selected audio extracts of their own L2 Chinese spoken data and were invited to comment upon any perceived pronunciation errors.
Distinguishing between the key constructs of accentedness, comprehensibility and intelligibility, I found that heavily accented tones did not necessarily lead to lower levels of comprehensibility and intelligibility. Furthermore, many intelligibility breakdowns – i.e. when raters failed to correctly transcribe the learners’ intended utterances - could be traced to problems with individual words which usually implicated segmental sounds as well as tone. All learners demonstrated low levels of awareness of their own pronunciation errors both during and after speech production while learners who were more intelligible were generally more aware of their own pronunciation errors.
The majority of findings were interpreted in terms of indicating a need for more explicit forms of instruction, particularly in light of the low levels of awareness surrounding learners’ own pronunciation errors. Nevertheless, I also recognised the need to provide a healthy balance of more implicit forms of instruction to cater for more incidental learning. In light of the case study nature of the research design, the pedagogical suggestions were framed with reference to the learners who participated in this study. However, it is hoped that they will also be useful for wider application within the context of teaching Chinese as an L2 to young beginners in Anglophone settings. In terms of methodology, the coding systems developed to investigate listeners’ responses to the L2 Chinese speech signal and the learners’ awareness of their own pronunciation errors provide a new tool for other researchers in the field.ESRC (Award 1088044
Methods and Effects of Shadowing Using Online Authentic Videos on L2 Acquisition of Mandarin Chinese Tones
Mandarin Chinese tones are notoriously difficult for second language (L2) learners. Previous research focuses on tone training methods that can help learners produce monosyllabic lexical tones, and studies about the production of multisyllabic lexical tones at the sentence level in spontaneous speech are limited. This study applies shadowing—a method where the learners repeat what they heard with as little delay as possible—to tone training and compares the effects of using authentic videos and textbook audios as shadowing materials for beginner L2 Mandarin learners’ tone improvement at the sentence level. Fourteen students in elementary Chinese classes at an American university participated in the tone training activity for four weeks. The participants in the “authentic video” group received authentic videos as their training materials, while the “textbook audio” group was trained with textbook audios. The participants shadowed the materials twice a week, six times per session, at home in their free time. Tone accuracy was rated by Mandarin native speakers according to the pre-test and the post-test consisting of a read-aloud task and a one-on-one conversation. Qualitative and quantitative surveys were conducted to analyze learners’ attitudes toward the shadowing activity and the materials.
The results indicate that learners in both groups showed significant improvements in their accuracy in spontaneous speech with no significant differences between the two groups. As for learners’ attitudes, although the participants reported overall positive feedback on the shadowing activity regardless of the materials, authentic materials generated great interest from the participants and were more appealing to the learners. A strong correlation between learners’ confidence in speaking and flexibility of the activity was also found. Based on the finding, pedagogical implications are discussed, including how to select suitable materials and shadowing instructions. For example, educators could introduce textbook audios first and gradually add authentic materials. The findings provide Mandarin Chinese instructors an effective and engaging way to improve learners’ tone production in spontaneous speaking. Incorporating shadowing activities into class has great potential to encourage learners’ autonomy without occupying precious class time. The findings not only contribute to research on teaching Chinese as a second language and the related pedagogy but also shed light on the use of authentic materials in second language teaching and learning
Fundamental frequency modelling: an articulatory perspective with target approximation and deep learning
Current statistical parametric speech synthesis (SPSS) approaches typically aim at state/frame-level acoustic modelling, which leads to a problem of frame-by-frame independence. Besides that, whichever learning technique is used, hidden Markov model (HMM), deep neural network (DNN) or recurrent neural network (RNN), the fundamental idea is to set up a direct mapping from linguistic to acoustic features. Although progress is frequently reported, this idea is questionable in terms of biological plausibility. This thesis aims at addressing the above issues by integrating dynamic mechanisms of human speech production as a core component of F0 generation and thus developing a more human-like F0 modelling paradigm. By introducing an articulatory F0 generation model – target approximation (TA) – between text and speech that controls syllable-synchronised F0 generation, contextual F0 variations are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. With the goal of demonstrating that human speech movement can be considered as a dynamic process of target approximation and that the TA model is a valid F0 generation model to be used at the motor-to-acoustic stage, a TA-based pitch control experiment is conducted first to simulate the subtle human behaviour of online compensation for pitch-shifted auditory feedback. Then, the TA parameters are collectively controlled by linguistic features via a deep or recurrent neural network (DNN/RNN) at the linguistic-to-motor stage. We trained the systems on a Mandarin Chinese dataset consisting of both statements and questions. The TA-based systems generally outperformed the baseline systems in both objective and subjective evaluations. Furthermore, the amount of required linguistic features were reduced first to syllable level only (with DNN) and then with all positional information removed (with RNN). Fewer linguistic features as input with limited number of TA parameters as output led to less training data and lower model complexity, which in turn led to more efficient training and faster synthesis
Teaching Chinese as a Foreign Language to English Speaking Language Learners: Teachers’ Handbook
This project developed a handbook for teachers to assist in the instruction of Chinese as a foreign language. The handbook provides teachers with practical lessons for teaching Chinese to adult beginning language learners. The handbook is based on autoethnographic analyses of my own experiences or stories related to foreign language learning and teaching the Chinese language. Lessons topics were developed based on these stories. The handbook put forwards 6 lesson plans corresponding to 6 specific topics. The handbook is supported by 2 theories: the audio-lingual and communicative foreign language teaching approaches. Based on these 2 teaching approaches, the main idea embedded in the handbook is that teaching spoken language before teaching Chinese writing and grammar rules can help adult novices to learn Chinese more effectively and apply the language in practical situations. Thus, the lesson plans in the handbook are designed to develop the speaking skills of adult learners for communicative purposes. Unlike many current Chinese teaching materials in which spoken and written Chinese are taught together, this handbook creates an innovative teaching method that emphasizes spoken-Chinese language learning for beginner learners. The lesson plans, as examples, are expected to inspire more Chinese teachers to explore and promote innovative teaching lessons and methods
Analysis on Using Synthesized Singing Techniques in Assistive Interfaces for Visually Impaired to Study Music
Tactile and auditory senses are the basic types of methods that visually impaired people sense the world. Their interaction with assistive technologies also focuses mainly on tactile and auditory interfaces. This research paper discuss about the validity of using most appropriate singing synthesizing techniques as a mediator in assistive technologies specifically built to address their music learning needs engaged with music scores and lyrics. Music scores with notations and lyrics are considered as the main mediators in musical communication channel which lies between a composer and a performer. Visually impaired music lovers have less opportunity to access this main mediator since most of them are in visual format. If we consider a music score, the vocal performer’s melody is married to all the pleasant sound producible in the form of singing. Singing best fits for a format in temporal domain compared to a tactile format in spatial domain. Therefore, conversion of existing visual format to a singing output will be the most appropriate nonlossy transition as proved by the initial research on adaptive music score trainer for visually impaired [1]. In order to extend the paths of this initial research, this study seek on existing singing synthesizing techniques and researches on auditory interfaces
- …