24 research outputs found

    Unit selection and waveform concatenation strategies in Cantonese text-to-speech.

    Get PDF
    Oey Sai Lok.Thesis (M.Phil.)--Chinese University of Hong Kong, 2005.Includes bibliographical references.Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1 --- An overview of Text-to-Speech technology --- p.2Chapter 1.1.1 --- Text processing --- p.2Chapter 1.1.2 --- Acoustic synthesis --- p.3Chapter 1.1.3 --- Prosody modification --- p.4Chapter 1.2 --- Trends in Text-to-Speech technologies --- p.5Chapter 1.3 --- Objectives of this thesis --- p.7Chapter 1.4 --- Outline of the thesis --- p.9References --- p.11Chapter 2. --- Cantonese Speech --- p.13Chapter 2.1 --- The Cantonese dialect --- p.13Chapter 2.2 --- Phonology of Cantonese --- p.14Chapter 2.2.1 --- Initials --- p.15Chapter 2.2.2 --- Finals --- p.16Chapter 2.2.3 --- Tones --- p.18Chapter 2.3 --- Acoustic-phonetic properties of Cantonese syllables --- p.19References --- p.24Chapter 3. --- Cantonese Text-to-Speech --- p.25Chapter 3.1 --- General overview --- p.25Chapter 3.1.1 --- Text processing --- p.25Chapter 3.1.2 --- Corpus based acoustic synthesis --- p.26Chapter 3.1.3 --- Prosodic control --- p.27Chapter 3.2 --- Syllable based Cantonese Text-to-Speech system --- p.28Chapter 3.3 --- Sub-syllable based Cantonese Text-to-Speech system --- p.29Chapter 3.3.1 --- Definition of sub-syllable units --- p.29Chapter 3.3.2 --- Acoustic inventory --- p.31Chapter 3.3.3 --- Determination of the concatenation points --- p.33Chapter 3.4 --- Problems --- p.34References --- p.36Chapter 4. --- Waveform Concatenation for Sub-syllable Units --- p.37Chapter 4.1 --- Previous work in concatenation methods --- p.37Chapter 4.1.1 --- Determination of concatenation point --- p.38Chapter 4.1.2 --- Waveform concatenation --- p.38Chapter 4.2 --- Problems and difficulties in concatenating sub-syllable units --- p.39Chapter 4.2.1 --- Mismatch of acoustic properties --- p.40Chapter 4.2.2 --- "Allophone problem of Initials /z/, Id and /s/" --- p.42Chapter 4.3 --- General procedures in concatenation strategies --- p.44Chapter 4.3.1 --- Concatenation of unvoiced segments --- p.45Chapter 4.3.2 --- Concatenation of voiced segments --- p.45Chapter 4.3.3 --- Measurement of spectral distance --- p.48Chapter 4.4 --- Detailed procedures in concatenation points determination --- p.50Chapter 4.4.1 --- Unvoiced segments --- p.50Chapter 4.4.2 --- Voiced segments --- p.53Chapter 4.5 --- Selected examples in concatenation strategies --- p.58Chapter 4.5.1 --- Concatenation at Initial segments --- p.58Chapter 4.5.1.1 --- Plosives --- p.58Chapter 4.5.1.2 --- Fricatives --- p.59Chapter 4.5.2 --- Concatenation at Final segments --- p.60Chapter 4.5.2.1 --- V group (long vowel) --- p.60Chapter 4.5.2.2 --- D group (diphthong) --- p.61References --- p.63Chapter 5. --- Unit Selection for Sub-syllable Units --- p.65Chapter 5.1 --- Basic requirements in unit selection process --- p.65Chapter 5.1.1 --- Availability of multiple copies of sub-syllable units --- p.65Chapter 5.1.1.1 --- "Levels of ""identical""" --- p.66Chapter 5.1.1.2 --- Statistics on the availability --- p.67Chapter 5.1.2 --- Variations in acoustic parameters --- p.70Chapter 5.1.2.1 --- Pitch level --- p.71Chapter 5.1.2.2 --- Duration --- p.74Chapter 5.1.2.3 --- Intensity level --- p.75Chapter 5.2 --- Selection process: availability check on sub-syllable units --- p.77Chapter 5.2.1 --- Multiple copies found --- p.79Chapter 5.2.2 --- Unique copy found --- p.79Chapter 5.2.3 --- No matched copy found --- p.80Chapter 5.2.4 --- Illustrative examples --- p.80Chapter 5.3 --- Selection process: acoustic analysis on candidate units --- p.81References --- p.88Chapter 6. --- Performance Evaluation --- p.89Chapter 6.1 --- General information --- p.90Chapter 6.1.1 --- Objective test --- p.90Chapter 6.1.2 --- Subjective test --- p.90Chapter 6.1.3 --- Test materials --- p.91Chapter 6.2 --- Details of the objective test --- p.92Chapter 6.2.1 --- Testing method --- p.92Chapter 6.2.2 --- Results --- p.93Chapter 6.2.3 --- Analysis --- p.96Chapter 6.3 --- Details of the subjective test --- p.98Chapter 6.3.1 --- Testing method --- p.98Chapter 6.3.2 --- Results --- p.99Chapter 6.3.3 --- Analysis --- p.101Chapter 6.4 --- Summary --- p.107References --- p.108Chapter 7. --- Conclusions and Future Works --- p.109Chapter 7.1 --- Conclusions --- p.109Chapter 7.2 --- Suggested future works --- p.111References --- p.113Appendix 1 Mean pitch level of Initials and Finals stored in the inventory --- p.114Appendix 2 Mean durations of Initials and Finals stored in the inventory --- p.121Appendix 3 Mean intensity level of Initials and Finals stored in the inventory --- p.124Appendix 4 Test word used in performance evaluation --- p.127Appendix 5 Test paragraph used in performance evaluation --- p.128Appendix 6 Pitch profile used in the Text-to-Speech system --- p.131Appendix 7 Duration model used in Text-to-Speech system --- p.13

    Generation of prosody and speech for Mandarin Chinese

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Phone-based speech synthesis using neural network with articulatory control.

    Get PDF
    by Lo Wai Kit.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 151-160).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Applications of Speech Synthesis --- p.2Chapter 1.1.1 --- Human Machine Interface --- p.2Chapter 1.1.2 --- Speech Aids --- p.3Chapter 1.1.3 --- Text-To-Speech (TTS) system --- p.4Chapter 1.1.4 --- Speech Dialogue System --- p.4Chapter 1.2 --- Current Status in Speech Synthesis --- p.6Chapter 1.2.1 --- Concatenation Based --- p.6Chapter 1.2.2 --- Parametric Based --- p.7Chapter 1.2.3 --- Articulatory Based --- p.7Chapter 1.2.4 --- Application of Neural Network in Speech Synthesis --- p.8Chapter 1.3 --- The Proposed Neural Network Speech Synthesis --- p.9Chapter 1.3.1 --- Motivation --- p.9Chapter 1.3.2 --- Objectives --- p.9Chapter 1.4 --- Thesis outline --- p.11Chapter 2 --- Linguistic Basics for Speech Synthesis --- p.12Chapter 2.1 --- Relations between Linguistic and Speech Synthesis --- p.12Chapter 2.2 --- Basic Phonology and Phonetics --- p.14Chapter 2.2.1 --- Phonology --- p.14Chapter 2.2.2 --- Phonetics --- p.15Chapter 2.2.3 --- Prosody --- p.16Chapter 2.3 --- Transcription Systems --- p.17Chapter 2.3.1 --- The Employed Transcription System --- p.18Chapter 2.4 --- Cantonese Phonology --- p.20Chapter 2.4.1 --- Some Properties of Cantonese --- p.20Chapter 2.4.2 --- Initial --- p.21Chapter 2.4.3 --- Final --- p.23Chapter 2.4.4 --- Lexical Tone --- p.25Chapter 2.4.5 --- Variations --- p.26Chapter 2.5 --- The Vowel Quadrilaterals --- p.29Chapter 3 --- Speech Synthesis Technology --- p.32Chapter 3.1 --- The Human Speech Production --- p.32Chapter 3.2 --- Important Issues in Speech Synthesis System --- p.34Chapter 3.2.1 --- Controllability --- p.34Chapter 3.2.2 --- Naturalness --- p.34Chapter 3.2.3 --- Complexity --- p.35Chapter 3.2.4 --- Information Storage --- p.35Chapter 3.3 --- Units for Synthesis --- p.37Chapter 3.4 --- Type of Synthesizer --- p.40Chapter 3.4.1 --- Copy Concatenation --- p.40Chapter 3.4.2 --- Vocoder --- p.41Chapter 3.4.3 --- Articulatory Synthesis --- p.44Chapter 4 --- Neural Network Speech Synthesis with Articulatory Control --- p.47Chapter 4.1 --- Neural Network Approximation --- p.48Chapter 4.1.1 --- The Approximation Problem --- p.48Chapter 4.1.2 --- Network Approach for Approximation --- p.49Chapter 4.2 --- Artificial Neural Network for Phone-based Speech Synthesis --- p.53Chapter 4.2.1 --- Network Approximation for Speech Signal Synthesis --- p.53Chapter 4.2.2 --- Feed forward Backpropagation Neural Network --- p.56Chapter 4.2.3 --- Radial Basis Function Network --- p.58Chapter 4.2.4 --- Parallel Operating Synthesizer Networks --- p.59Chapter 4.3 --- Template Storage and Control for the Synthesizer Network --- p.61Chapter 4.3.1 --- Implicit Template Storage --- p.61Chapter 4.3.2 --- Articulatory Control Parameters --- p.61Chapter 4.4 --- Summary --- p.65Chapter 5 --- Prototype Implementation of the Synthesizer Network --- p.66Chapter 5.1 --- Implementation of the Synthesizer Network --- p.66Chapter 5.1.1 --- Network Architectures --- p.68Chapter 5.1.2 --- Spectral Templates for Training --- p.74Chapter 5.1.3 --- System requirement --- p.76Chapter 5.2 --- Subjective Listening Test --- p.79Chapter 5.2.1 --- Sample Selection --- p.79Chapter 5.2.2 --- Test Procedure --- p.81Chapter 5.2.3 --- Result --- p.83Chapter 5.2.4 --- Analysis --- p.86Chapter 5.3 --- Summary --- p.88Chapter 6 --- Simplified Articulatory Control for the Synthesizer Network --- p.89Chapter 6.1 --- Coarticulatory Effect in Speech Production --- p.90Chapter 6.1.1 --- Acoustic Effect --- p.90Chapter 6.1.2 --- Prosodic Effect --- p.91Chapter 6.2 --- Control in various Synthesis Techniques --- p.92Chapter 6.2.1 --- Copy Concatenation --- p.92Chapter 6.2.2 --- Formant Synthesis --- p.93Chapter 6.2.3 --- Articulatory synthesis --- p.93Chapter 6.3 --- Articulatory Control Model based on Vowel Quad --- p.94Chapter 6.3.1 --- Modeling of Variations with the Articulatory Control Model --- p.95Chapter 6.4 --- Voice Correspondence : --- p.97Chapter 6.4.1 --- For Nasal Sounds ´ؤ Inter-Network Correspondence --- p.98Chapter 6.4.2 --- In Flat-Tongue Space - Intra-Network Correspondence --- p.101Chapter 6.5 --- Summary --- p.108Chapter 7 --- Pause Duration Properties in Cantonese Phrases --- p.109Chapter 7.1 --- The Prosodic Feature - Inter-Syllable Pause --- p.110Chapter 7.2 --- Experiment for Measuring Inter-Syllable Pause of Cantonese Phrases --- p.111Chapter 7.2.1 --- Speech Material Selection --- p.111Chapter 7.2.2 --- Experimental Procedure --- p.112Chapter 7.2.3 --- Result --- p.114Chapter 7.3 --- Characteristics of Inter-Syllable Pause in Cantonese Phrases --- p.117Chapter 7.3.1 --- Pause Duration Characteristics for Initials after Pause --- p.117Chapter 7.3.2 --- Pause Duration Characteristic for Finals before Pause --- p.119Chapter 7.3.3 --- General Observations --- p.119Chapter 7.3.4 --- Other Observations --- p.121Chapter 7.4 --- Application of Pause-duration Statistics to the Synthesis System --- p.124Chapter 7.5 --- Summary --- p.126Chapter 8 --- Conclusion and Further Work --- p.127Chapter 8.1 --- Conclusion --- p.127Chapter 8.2 --- Further Extension Work --- p.130Chapter 8.2.1 --- Regularization Network Optimized on ISD --- p.130Chapter 8.2.2 --- Incorporation of Non-Articulatory Parameters to Control Space --- p.130Chapter 8.2.3 --- Experiment on Other Prosodic Features --- p.131Chapter 8.2.4 --- Application of Voice Correspondence to Cantonese Coda Discrim- ination --- p.131Chapter A --- Cantonese Initials and Finals --- p.132Chapter A.1 --- Tables of All Cantonese Initials and Finals --- p.132Chapter B --- Using Distortion Measure as Error Function in Neural Network --- p.135Chapter B.1 --- Formulation of Itakura-Saito Distortion Measure for Neural Network Error Function --- p.135Chapter B.2 --- Formulation of a Modified Itakura-Saito Distortion (MISD) Measure for Neural Network Error Function --- p.137Chapter C --- Orthogonal Least Square Algorithm for RBFNet Training --- p.138Chapter C.l --- Orthogonal Least Squares Learning Algorithm for Radial Basis Function Network Training --- p.138Chapter D --- Phrase Lists --- p.140Chapter D.1 --- Two-Syllable Phrase List for the Pause Duration Experiment --- p.140Chapter D.1.1 --- 兩字詞 --- p.140Chapter D.2 --- Three/Four-Syllable Phrase List for the Pause Duration Experiment --- p.144Chapter D.2.1 --- 片語 --- p.14

    Fundamental frequency modelling: an articulatory perspective with target approximation and deep learning

    Get PDF
    Current statistical parametric speech synthesis (SPSS) approaches typically aim at state/frame-level acoustic modelling, which leads to a problem of frame-by-frame independence. Besides that, whichever learning technique is used, hidden Markov model (HMM), deep neural network (DNN) or recurrent neural network (RNN), the fundamental idea is to set up a direct mapping from linguistic to acoustic features. Although progress is frequently reported, this idea is questionable in terms of biological plausibility. This thesis aims at addressing the above issues by integrating dynamic mechanisms of human speech production as a core component of F0 generation and thus developing a more human-like F0 modelling paradigm. By introducing an articulatory F0 generation model – target approximation (TA) – between text and speech that controls syllable-synchronised F0 generation, contextual F0 variations are processed in two separate yet integrated stages: linguistic to motor, and motor to acoustic. With the goal of demonstrating that human speech movement can be considered as a dynamic process of target approximation and that the TA model is a valid F0 generation model to be used at the motor-to-acoustic stage, a TA-based pitch control experiment is conducted first to simulate the subtle human behaviour of online compensation for pitch-shifted auditory feedback. Then, the TA parameters are collectively controlled by linguistic features via a deep or recurrent neural network (DNN/RNN) at the linguistic-to-motor stage. We trained the systems on a Mandarin Chinese dataset consisting of both statements and questions. The TA-based systems generally outperformed the baseline systems in both objective and subjective evaluations. Furthermore, the amount of required linguistic features were reduced first to syllable level only (with DNN) and then with all positional information removed (with RNN). Fewer linguistic features as input with limited number of TA parameters as output led to less training data and lower model complexity, which in turn led to more efficient training and faster synthesis

    Conveying expressivity and vocal effort transformation in synthetic speech with Harmonic plus Noise Models

    Get PDF
    Aquesta tesi s'ha dut a terme dins del Grup en de Tecnologies Mèdia (GTM) de l'Escola d'Enginyeria i Arquitectura la Salle. El grup te una llarga trajectòria dins del cap de la síntesi de veu i fins i tot disposa d'un sistema propi de síntesi per concatenació d'unitats (US-TTS) que permet sintetitzar diferents estils expressius usant múltiples corpus. De forma que per a realitzar una síntesi agressiva, el sistema usa el corpus de l'estil agressiu, i per a realitzar una síntesi sensual, usa el corpus de l'estil corresponent. Aquesta tesi pretén proposar modificacions del esquema del US-TTS que permetin millorar la flexibilitat del sistema per sintetitzar múltiples expressivitats usant només un únic corpus d'estil neutre. L'enfoc seguit en aquesta tesi es basa en l'ús de tècniques de processament digital del senyal (DSP) per aplicar modificacions de senyal a la veu sintetitzada per tal que aquesta expressi l'estil de parla desitjat. Per tal de dur a terme aquestes modificacions de senyal s'han usat els models harmònic més soroll per la seva flexibilitat a l'hora de realitzar modificacions de senyal. La qualitat de la veu (VoQ) juga un paper important en els diferents estils expressius. És per això que es va estudiar la síntesi de diferents emocions mitjançant la modificació de paràmetres de VoQ de baix nivell. D'aquest estudi es van identificar un conjunt de limitacions que van donar lloc als objectius d'aquesta tesi, entre ells el trobar un paràmetre amb gran impacte sobre els estils expressius. Per aquest fet l'esforç vocal (VE) es va escollir per el seu paper important en la parla expressiva. Primer es va estudiar la possibilitat de transferir l'VE entre dues realitzacions amb diferent VE de la mateixa paraula basant-se en la tècnica de predicció lineal adaptativa del filtre de pre-èmfasi (APLP). La proposta va permetre transferir l'VE correctament però presentava limitacions per a poder generar nivells intermitjos d'VE. Amb la finalitat de millorar la flexibilitat i control de l'VE expressat a la veu sintetitzada, es va proposar un nou model d'VE basat en polinomis lineals. Aquesta proposta va permetre transferir l'VE entre dues paraules qualsevols i sintetitzar nous nivells d'VE diferents dels disponibles al corpus. Aquesta flexibilitat esta alineada amb l'objectiu general d'aquesta tesi, permetre als sistemes US-TTS sintetitzar diferents estils expressius a partir d'un únic corpus d'estil neutre. La proposta realitzada també inclou un paràmetre que permet controlar fàcilment el nivell d'VE sintetitzat. Això obre moltes possibilitats per controlar fàcilment el procés de síntesi tal i com es va fer al projecte CreaVeu usant interfícies gràfiques simples i intuïtives, també realitzat dins del grup GTM. Aquesta memòria conclou presentant el treball realitzat en aquesta tesi i amb una proposta de modificació de l'esquema d'un sistema US-TTS per incloure els blocs de DSP desenvolupats en aquesta tesi que permetin al sistema sintetitzar múltiple nivells d'VE a partir d'un corpus d'estil neutre. Això obre moltes possibilitats per generar interfícies d'usuari que permetin controlar fàcilment el procés de síntesi, tal i com es va fer al projecte CreaVeu, també realitzat dins del grup GTM. Aquesta memòria conclou presentant el treball realitzat en aquesta tesi i amb una proposta de modificació de l'esquema del sistema US-TTS per incloure els blocs de DSP desenvolupats en aquesta tesi que permetin al sistema sintetitzar múltiple nivells d'VE a partir d'un corpus d'estil neutre.Esta tesis se llevó a cabo en el Grup en Tecnologies Mèdia de la Escuela de Ingeniería y Arquitectura la Salle. El grupo lleva una larga trayectoria dentro del campo de la síntesis de voz y cuenta con su propio sistema de síntesis por concatenación de unidades (US-TTS). El sistema permite sintetizar múltiples estilos expresivos mediante el uso de corpus específicos para cada estilo expresivo. De este modo, para realizar una síntesis agresiva, el sistema usa el corpus de este estilo, y para un estilo sensual, usa otro corpus específico para ese estilo. La presente tesis aborda el problema con un enfoque distinto proponiendo cambios en el esquema del sistema con el fin de mejorar la flexibilidad para sintetizar múltiples estilos expresivos a partir de un único corpus de estilo de habla neutro. El planteamiento seguido en esta tesis esta basado en el uso de técnicas de procesamiento de señales (DSP) para llevar a cabo modificaciones del señal de voz para que este exprese el estilo de habla deseado. Para llevar acabo las modificaciones de la señal de voz se han usado los modelos harmónico más ruido (HNM) por su flexibilidad para efectuar modificaciones de señales. La cualidad de la voz (VoQ) juega un papel importante en diferentes estilos expresivos. Por ello se exploró la síntesis expresiva basada en modificaciones de parámetros de bajo nivel de la VoQ. Durante este estudio se detectaron diferentes problemas que dieron pié a los objetivos planteados en esta tesis, entre ellos el encontrar un único parámetro con fuerte influencia en la expresividad. El parámetro seleccionado fue el esfuerzo vocal (VE) por su importante papel a la hora de expresar diferentes emociones. Las primeras pruebas se realizaron con el fin de transferir el VE entre dos realizaciones con diferente grado de VE de la misma palabra usando una metodología basada en un proceso filtrado de pre-émfasis adaptativo con coeficientes de predicción lineales (APLP). Esta primera aproximación logró transferir el nivel de VE entre dos realizaciones de la misma palabra, sin embargo el proceso presentaba limitaciones para generar niveles de esfuerzo vocal intermedios. A fin de mejorar la flexibilidad y el control del sistema para expresar diferentes niveles de VE, se planteó un nuevo modelo de VE basado en polinomios lineales. Este modelo permitió transferir el VE entre dos palabras diferentes e incluso generar nuevos niveles no presentes en el corpus usado para la síntesis. Esta flexibilidad está alineada con el objetivo general de esta tesis de permitir a un sistema US-TTS expresar múltiples estilos de habla expresivos a partir de un único corpus de estilo neutro. Además, la metodología propuesta incorpora un parámetro que permite de forma sencilla controlar el nivel de VE expresado en la voz sintetizada. Esto abre la posibilidad de controlar fácilmente el proceso de síntesis tal y como se hizo en el proyecto CreaVeu usando interfaces simples e intuitivas, también realizado dentro del grupo GTM. Esta memoria concluye con una revisión del trabajo realizado en esta tesis y con una propuesta de modificación de un esquema de US-TTS para expresar diferentes niveles de VE a partir de un único corpus neutro.This thesis was conducted in the Grup en Tecnologies M`edia (GTM) from Escola d’Enginyeria i Arquitectura la Salle. The group has a long trajectory in the speech synthesis field and has developed their own Unit-Selection Text-To-Speech (US-TTS) which is able to convey multiple expressive styles using multiple expressive corpora, one for each expressive style. Thus, in order to convey aggressive speech, the US-TTS uses an aggressive corpus, whereas for a sensual speech style, the system uses a sensual corpus. Unlike that approach, this dissertation aims to present a new schema for enhancing the flexibility of the US-TTS system for performing multiple expressive styles using a single neutral corpus. The approach followed in this dissertation is based on applying Digital Signal Processing (DSP) techniques for carrying out speech modifications in order to synthesize the desired expressive style. For conducting the speech modifications the Harmonics plus Noise Model (HNM) was chosen for its flexibility in conducting signal modifications. Voice Quality (VoQ) has been proven to play an important role in different expressive styles. Thus, low-level VoQ acoustic parameters were explored for conveying multiple emotions. This raised several problems setting new objectives for the rest of the thesis, among them finding a single parameter with strong impact on the expressive style conveyed. Vocal Effort (VE) was selected for conducting expressive speech style modifications due to its salient role in expressive speech. The first approach working with VE was based on transferring VE between two parallel utterances based on the Adaptive Pre-emphasis Linear Prediction (APLP) technique. This approach allowed transferring VE but the model presented certain restrictions regarding its flexibility for generating new intermediate VE levels. Aiming to improve the flexibility and control of the conveyed VE, a new approach using polynomial model for modelling VE was presented. This model not only allowed transferring VE levels between two different utterances, but also allowed to generate other VE levels than those present in the speech corpus. This is aligned with the general goal of this thesis, allowing US-TTS systems to convey multiple expressive styles with a single neutral corpus. Moreover, the proposed methodology introduces a parameter for controlling the degree of VE in the synthesized speech signal. This opens new possibilities for controlling the synthesis process such as the one in the CreaVeu project using a simple and intuitive graphical interfaces, also conducted in the GTM group. The dissertation concludes with a review of the conducted work and a proposal for schema modifications within a US-TTS system for introducing the VE modification blocks designed in this dissertation

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

    PHONOLOGICAL PROCESSING UNIT TRANSFER: THE IMPACT OF FIRST LANGUAGE SYLLABLE STRUCTURE AND ITS IMPLICATIONS FOR PREFERRED SUBSYLLABIC DIVISION UNITS

    Get PDF
    This study investigated the potential transfer of first language (L1) phonological processing unit to second language processing. English and Chinese phonology differ mainly in the complexity of their syllable structures. English phonology allows highly complex syllable structures, whereas Chinese has been characterized primarily as a core syllable language, i.e., its syllables typically consist only of a consonant and vowel (CV). This sharp contrast is hypothesized to entail different phonological processing units in the two languages, and to result in, through L1 transfer, the poor phonological awareness often observed in Chinese speakers learning English as a second language (ESL). This hypothesis was tested by examining the performance patterns of Chinese ESL fourth graders on phoneme deletion and phoneme isolation tasks. The results suggest that Chinese ESL children do seem to process an English syllable in terms of an intact core syllable plus its appendices due to L1 transfer. This gives support to a developmental account of subsyllabic division unit preference, which suggests that core syllable is universally preferred in the initial stages of language development, only after which speakers of different languages diverge in their division unit preferences due to linguistic characteristics of their respective L1s. The presence of transfer suggested that Chinese ESL children performed differently on two item types--core-syllable items (requiring segmentation of an element within the core syllable) and non-core-syllable items (requiring segmentation of any appendices from the core syllable). As phonological awareness involves the ability to segment cohesive sound units, it was hypothesized that only performance on core-syllable items should represent phonological awareness. This hypothesis was tested by analyzing the item types' respective contribution to decoding skills. Phonological awareness has long been established as a strong predictor of decoding skills; thus the analyses served to test the two item types' respective criterion validity in tapping phonological awareness. The results confirmed the hypothesis. This implies that, methodologically, phonological awareness of Chinese ESL children could be more reliably measured if, in future studies, only core-syllable segmentation items are employed. Educationally, instruction in phonological awareness might emphasize core-syllable segmentation, which alone appears to reflect Chinese ESL children's phonological awareness

    Exploiting the GPU power for intensive geometric and imaging data computation.

    Get PDF
    Wang Jianqing.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 81-86).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Thesis --- p.3Chapter 1.3 --- Contributions --- p.4Chapter 1.4 --- Organization --- p.6Chapter 2 --- Programmable Graphics Hardware --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Why Use GPU? --- p.9Chapter 2.3 --- Programmable Graphics Hardware Architecture --- p.11Chapter 2.4 --- Previous Work on GPU Computation --- p.15Chapter 3 --- Multilingual Virtual Performer --- p.17Chapter 3.1 --- Overview --- p.17Chapter 3.2 --- Previous Work --- p.18Chapter 3.3 --- System Overview --- p.20Chapter 3.4 --- Facial Animation --- p.22Chapter 3.4.1 --- Facial Animation using Face Space --- p.23Chapter 3.4.2 --- Face Set Selection for Lip Synchronization --- p.27Chapter 3.4.3 --- The Blending Weight Function Generation and Coartic- ulation --- p.33Chapter 3.4.4 --- Expression Overlay --- p.38Chapter 3.4.5 --- GPU Algorithm --- p.39Chapter 3.5 --- Character Animation --- p.44Chapter 3.5.1 --- Skeletal Animation Primer --- p.44Chapter 3.5.2 --- Mathematics of Kinematics --- p.46Chapter 3.5.3 --- Animating with Motion Capture Data --- p.48Chapter 3.5.4 --- Skeletal Subspace Deformation --- p.49Chapter 3.5.5 --- GPU Algorithm --- p.50Chapter 3.6 --- Integration of Skeletal and Facial Animation --- p.52Chapter 3.7 --- Result --- p.53Chapter 3.7.1 --- Summary --- p.58Chapter 4 --- Discrete Wavelet Transform On GPU --- p.60Chapter 4.1 --- Introduction --- p.60Chapter 4.1.1 --- Previous Works --- p.61Chapter 4.1.2 --- Our Solution --- p.61Chapter 4.2 --- Multiresolution Analysis with Wavelets --- p.62Chapter 4.3 --- Fragment Processor for Pixel Processing --- p.64Chapter 4.4 --- DWT Pipeline --- p.65Chapter 4.4.1 --- Convolution Versus Lifting --- p.65Chapter 4.4.2 --- DWT Pipeline --- p.67Chapter 4.5 --- Forward DWT --- p.68Chapter 4.6 --- Inverse DWT --- p.71Chapter 4.7 --- Results and Applications --- p.73Chapter 4.7.1 --- Geometric Deformation in Wavelet Domain --- p.73Chapter 4.7.2 --- Stylish Image Processing and Texture-illuminance De- coupling --- p.73Chapter 4.7.3 --- Hardware-Accelerated JPEG2000 Encoding --- p.75Chapter 4.8 --- Web Information --- p.78Chapter 5 --- Conclusion --- p.79Bibliography --- p.8

    Methods in Contemporary Linguistics

    Get PDF
    The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests
    corecore