499 research outputs found

    Coupling between the laryngeal and supralaryngeal systems

    Get PDF
    Includes bibliographical references (p. 27-30)."A dissertation submitted in partial fulfillment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2010."Thesis (B.Sc)--University of Hong Kong, 2010.The present study investigated the coupling between the laryngeal and supralaryngeal systems in speech production. The interrelationship between the two systems was examined by studying the possible interaction between tone production (laryngeal system) and articulation (supralaryngeal system). Sixty (30 male and 30 female) native Cantonese speakers participated in the study. The first and second formant frequencies (F1 and F2) associated with the four vowels /i, u, ?, ?/ produced at six Cantonese lexical tones (highlevel, high-rising, mid-level, low-falling, low-rising and low-level tones) were obtained. Results revealed that, regardless of vowels, significant articulatory changes were found when produced at different tones. However, the difference pattern across each vowel was not systematic. Gender difference was also noted; male and female speakers showed different patterns in articulatory changes. These findings revealed the coupling effect between the laryngeal and supra-laryngeal systems.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    A kinematic study of coarticulation of Cantonese fricative /s/ using electromagnetic articulography (EMA)

    Get PDF
    Includes bibliographical references (p. 25-29).Thesis (B.Sc)--University of Hong Kong, 2009."A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2009."published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    A kinematic study of coarticulation of Cantonese fricative /s/ using electromagnetic articulography (EMA)

    Get PDF
    Includes bibliographical references (p. 25-29).Thesis (B.Sc)--University of Hong Kong, 2009."A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2009."published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    An EMA study of the articulatory-acoustic relationship of Cantonese corner vowels

    Get PDF
    "A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2009."Includes bibliographical references (p. 21-23).Thesis (B.Sc)--University of Hong Kong, 2009.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    Full text link
    Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing voice to reach the audience or to render the phonetic information carried by the vowel (tone, identity) and some aesthetic effects like ornamentation

    Expression and perception of gender in prepubertal children's voice : an acoustic study

    Get PDF
    The present study described the fundamental frequency (f0) and the first two formant frequencies (F1 and F2) obtained from voice samples produced by 25 male and 26 female Cantonese-speaking prepubertal children under natural (neutral) condition and upon request to mimic the opposite gender voice (imitation condition), to investigate sexual dimorphism in prepubertal children’s voice and to assess their implicit knowledge on voice gender. Average accuracy of voice gender identification by adult listeners was 81.7% and 41.1% for prepubertal children’s voice gender produced under neutral and imitation conditions respectively. No significant difference in f0 was found between genders under neutral condition, suggesting similar vocal mechanism for prepubertal boys and girls. Average F1 and F2 associated with boys were lower than that with girls. It was suggested that both difference in vocal tract length, and the sex-specific articulatory behaviors contributed to the differences in formant frequencies, thus enhancing sexual dimorphism for gender voice identification. Under imitation condition, boys exhibited significantly higher f0 than girls. F1 associated with boys was also higher than that with girls. It could be concluded that prepubertal children had the implicit knowledge on the sexually dimorphic acoustic correlates (f0 and F1) and were capable in altering the vibration rate of vocal folds and the effective vocal tract length upon request to conform to vocal characteristics of the opposite gender.published_or_final_versionSpeech and Hearing SciencesBachelorBachelor of Science in Speech and Hearing Science

    Phone-based speech synthesis using neural network with articulatory control.

    Get PDF
    by Lo Wai Kit.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 151-160).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Applications of Speech Synthesis --- p.2Chapter 1.1.1 --- Human Machine Interface --- p.2Chapter 1.1.2 --- Speech Aids --- p.3Chapter 1.1.3 --- Text-To-Speech (TTS) system --- p.4Chapter 1.1.4 --- Speech Dialogue System --- p.4Chapter 1.2 --- Current Status in Speech Synthesis --- p.6Chapter 1.2.1 --- Concatenation Based --- p.6Chapter 1.2.2 --- Parametric Based --- p.7Chapter 1.2.3 --- Articulatory Based --- p.7Chapter 1.2.4 --- Application of Neural Network in Speech Synthesis --- p.8Chapter 1.3 --- The Proposed Neural Network Speech Synthesis --- p.9Chapter 1.3.1 --- Motivation --- p.9Chapter 1.3.2 --- Objectives --- p.9Chapter 1.4 --- Thesis outline --- p.11Chapter 2 --- Linguistic Basics for Speech Synthesis --- p.12Chapter 2.1 --- Relations between Linguistic and Speech Synthesis --- p.12Chapter 2.2 --- Basic Phonology and Phonetics --- p.14Chapter 2.2.1 --- Phonology --- p.14Chapter 2.2.2 --- Phonetics --- p.15Chapter 2.2.3 --- Prosody --- p.16Chapter 2.3 --- Transcription Systems --- p.17Chapter 2.3.1 --- The Employed Transcription System --- p.18Chapter 2.4 --- Cantonese Phonology --- p.20Chapter 2.4.1 --- Some Properties of Cantonese --- p.20Chapter 2.4.2 --- Initial --- p.21Chapter 2.4.3 --- Final --- p.23Chapter 2.4.4 --- Lexical Tone --- p.25Chapter 2.4.5 --- Variations --- p.26Chapter 2.5 --- The Vowel Quadrilaterals --- p.29Chapter 3 --- Speech Synthesis Technology --- p.32Chapter 3.1 --- The Human Speech Production --- p.32Chapter 3.2 --- Important Issues in Speech Synthesis System --- p.34Chapter 3.2.1 --- Controllability --- p.34Chapter 3.2.2 --- Naturalness --- p.34Chapter 3.2.3 --- Complexity --- p.35Chapter 3.2.4 --- Information Storage --- p.35Chapter 3.3 --- Units for Synthesis --- p.37Chapter 3.4 --- Type of Synthesizer --- p.40Chapter 3.4.1 --- Copy Concatenation --- p.40Chapter 3.4.2 --- Vocoder --- p.41Chapter 3.4.3 --- Articulatory Synthesis --- p.44Chapter 4 --- Neural Network Speech Synthesis with Articulatory Control --- p.47Chapter 4.1 --- Neural Network Approximation --- p.48Chapter 4.1.1 --- The Approximation Problem --- p.48Chapter 4.1.2 --- Network Approach for Approximation --- p.49Chapter 4.2 --- Artificial Neural Network for Phone-based Speech Synthesis --- p.53Chapter 4.2.1 --- Network Approximation for Speech Signal Synthesis --- p.53Chapter 4.2.2 --- Feed forward Backpropagation Neural Network --- p.56Chapter 4.2.3 --- Radial Basis Function Network --- p.58Chapter 4.2.4 --- Parallel Operating Synthesizer Networks --- p.59Chapter 4.3 --- Template Storage and Control for the Synthesizer Network --- p.61Chapter 4.3.1 --- Implicit Template Storage --- p.61Chapter 4.3.2 --- Articulatory Control Parameters --- p.61Chapter 4.4 --- Summary --- p.65Chapter 5 --- Prototype Implementation of the Synthesizer Network --- p.66Chapter 5.1 --- Implementation of the Synthesizer Network --- p.66Chapter 5.1.1 --- Network Architectures --- p.68Chapter 5.1.2 --- Spectral Templates for Training --- p.74Chapter 5.1.3 --- System requirement --- p.76Chapter 5.2 --- Subjective Listening Test --- p.79Chapter 5.2.1 --- Sample Selection --- p.79Chapter 5.2.2 --- Test Procedure --- p.81Chapter 5.2.3 --- Result --- p.83Chapter 5.2.4 --- Analysis --- p.86Chapter 5.3 --- Summary --- p.88Chapter 6 --- Simplified Articulatory Control for the Synthesizer Network --- p.89Chapter 6.1 --- Coarticulatory Effect in Speech Production --- p.90Chapter 6.1.1 --- Acoustic Effect --- p.90Chapter 6.1.2 --- Prosodic Effect --- p.91Chapter 6.2 --- Control in various Synthesis Techniques --- p.92Chapter 6.2.1 --- Copy Concatenation --- p.92Chapter 6.2.2 --- Formant Synthesis --- p.93Chapter 6.2.3 --- Articulatory synthesis --- p.93Chapter 6.3 --- Articulatory Control Model based on Vowel Quad --- p.94Chapter 6.3.1 --- Modeling of Variations with the Articulatory Control Model --- p.95Chapter 6.4 --- Voice Correspondence : --- p.97Chapter 6.4.1 --- For Nasal Sounds ´ؤ Inter-Network Correspondence --- p.98Chapter 6.4.2 --- In Flat-Tongue Space - Intra-Network Correspondence --- p.101Chapter 6.5 --- Summary --- p.108Chapter 7 --- Pause Duration Properties in Cantonese Phrases --- p.109Chapter 7.1 --- The Prosodic Feature - Inter-Syllable Pause --- p.110Chapter 7.2 --- Experiment for Measuring Inter-Syllable Pause of Cantonese Phrases --- p.111Chapter 7.2.1 --- Speech Material Selection --- p.111Chapter 7.2.2 --- Experimental Procedure --- p.112Chapter 7.2.3 --- Result --- p.114Chapter 7.3 --- Characteristics of Inter-Syllable Pause in Cantonese Phrases --- p.117Chapter 7.3.1 --- Pause Duration Characteristics for Initials after Pause --- p.117Chapter 7.3.2 --- Pause Duration Characteristic for Finals before Pause --- p.119Chapter 7.3.3 --- General Observations --- p.119Chapter 7.3.4 --- Other Observations --- p.121Chapter 7.4 --- Application of Pause-duration Statistics to the Synthesis System --- p.124Chapter 7.5 --- Summary --- p.126Chapter 8 --- Conclusion and Further Work --- p.127Chapter 8.1 --- Conclusion --- p.127Chapter 8.2 --- Further Extension Work --- p.130Chapter 8.2.1 --- Regularization Network Optimized on ISD --- p.130Chapter 8.2.2 --- Incorporation of Non-Articulatory Parameters to Control Space --- p.130Chapter 8.2.3 --- Experiment on Other Prosodic Features --- p.131Chapter 8.2.4 --- Application of Voice Correspondence to Cantonese Coda Discrim- ination --- p.131Chapter A --- Cantonese Initials and Finals --- p.132Chapter A.1 --- Tables of All Cantonese Initials and Finals --- p.132Chapter B --- Using Distortion Measure as Error Function in Neural Network --- p.135Chapter B.1 --- Formulation of Itakura-Saito Distortion Measure for Neural Network Error Function --- p.135Chapter B.2 --- Formulation of a Modified Itakura-Saito Distortion (MISD) Measure for Neural Network Error Function --- p.137Chapter C --- Orthogonal Least Square Algorithm for RBFNet Training --- p.138Chapter C.l --- Orthogonal Least Squares Learning Algorithm for Radial Basis Function Network Training --- p.138Chapter D --- Phrase Lists --- p.140Chapter D.1 --- Two-Syllable Phrase List for the Pause Duration Experiment --- p.140Chapter D.1.1 --- 兩字詞 --- p.140Chapter D.2 --- Three/Four-Syllable Phrase List for the Pause Duration Experiment --- p.144Chapter D.2.1 --- 片語 --- p.14

    Cross-language Differences in Fricative Processing and Their Influence on Non-native Fricative Categorisation

    Get PDF
    Studies have shown that native speakers of Mandarin Chinese and Hong Kong Cantonese tend to have difficulty perceiving the English fricative /θ/. However, although both languages have /f/ and /s/ categories, Mandarin speakers tend to assimilate it to their /s/ category whilst Cantonese speakers would assimilate it to their /f/ category. Over three studies, this thesis investigated various factors that may lead to this difference, while enhancing our understanding of the acoustics and the perception of the fricatives of these languages. Study 1 explored acoustic properties of target fricatives of the three languages (Mandarin, Cantonese, English) using audio recordings from native speakers, and conducted comparisons of the fricatives within and across languages. The results showed that the phonemes /f s/, even though shared by the three languages, were produced differently in the different languages, likely due to the effects of the different fricative inventories. Moreover, different acoustic cues were more or less effective in distinguishing between the different fricatives in each language, indicating that native speakers of these languages likely rely on these cues differently. Study 2 examined how transition cues may affect the identification of /f/ and /s/ by native speakers of the respective languages by combining a phoneme monitoring task and EEG measures. Target fricatives were spliced with vowels to create stimuli with congruent or incongruent transitions. In contrast to previous studies (e.g., Wagner, Ernestus & Cutler, 2006), the results revealed that all groups attended to formant transitions when processing fricatives, despite their differing native fricative inventory sizes. Study 3 investigated cross-language differences in categorisation boundaries of target fricative pairs using a behavioural identification task. The study interpolated pairs of stimuli to create a frication continuum and a vowel continuum, forming a 2-dimensional stimuli grid. The results indicated that frication was the primary cue for fricative identification for the native English, Cantonese, and Mandarin speakers, but also revealed cross-language differences in fricative boundaries. Overall, the results of these studies demonstrate that the processing of fricatives was largely driven by the frication section, and the differential assimilation of /θ/ was likely due to the different acoustics of the same fricative category across languages. The results also motivate a reconsideration of the role of coarticulatory cues in fricative perception

    Long-term average spectral characteristics of Cantonese alaryngeal speech

    Get PDF
    Objective: In Hong Kong, esophageal (SE), tracheoesophageal (TE), electrolaryngeal (EL), and pneumatic artificial laryngeal (PA) speech are commonly used by laryngectomees as a means to regain verbal communication after total laryngectomy. While SE and TE speech has been studied to some extent, little is known regarding the EL and PA sound quality. The present study examined the sound quality associated with SE, TE, EL, and PA speech, and compared with that associated with laryngeal (NL) speech by using long-term average speech spectra (LTAS). Methods: Continuous speech samples of reading a 136-word passage were obtained from NL, SE, TE, EL, and PA speakers of Cantonese. The alaryngeal speakers were all superior speakers selected from the New Voice Club of Hong Kong, which is a self-help organization for the laryngectomees in Hong Kong. TE speakers were fitted with Provox valve, and EL speakers used Servox-type electrolarynx. Speech samples were digitized at 20 kHz and 16 bits/sample by using Praat, based on which LTAS contours were developed. First spectral peak (FSP), mean spectral energy (MSE), and spectral tilt (ST) derived from the LTAS contours associated with different speaker groups were compared. Results: Data revealed all speakers generally exhibited similar LTA contours. However, PA speakers exhibited the lowest average FSP value and the greatest average MSE value. NL phonation was associated with a significantly greater ST value than alaryngeal speech of Cantonese. Conclusion: The differences in FSP, MSE, and ST values in different speaker groups may be related to the different sound sources being used by the laryngectomees, and the difference in the way the sound source is coupled with the vocal tract system. © 2009 Elsevier Ireland Ltd. All rights reserved.postprin
    corecore