96 research outputs found
Vowel Sound Disambiguation for Intelligible Korean Speech Synthesis
PACLIC 19 / Taipei, taiwan / December 1-3, 200
Segmental foreign accent
200 p.Tradicionalmente, el acento extranjero se ha estudiado desde una perspectiva holística, es decir, tratándolo como un todo en lugar de como una serie de rasgos individuales que suceden simultáneamente. Los estudios previos que se han centrado en alguno de estos rasgos individuales lo han hecho generalmente en el plano suprasegmental (Tajima et al., 1997, Munro & Derwing, 2001, Hahn, 2004, etc.). En esta tesis se lleva a cabo un análisis del acento extranjero desde un punto de vista segmental. Considerando que no existe mucha investigación en este campo, nuestro principal objetivo es averiguar si los resultados de estudios holísticos previos pueden ser extrapolados al nivel segmental. Con el objetivo de analizar el nivel segmental en detalle, en esta tesis se presentan técnicas que hacen uso de nuevas tecnologías. Para recabar la mayor información posible, los experimentos perceptivos son llevados a cabo con oyentes con muy distintos perfiles lingüísticos en términos de primera lengua o conocimiento de la segunda lengua y comparados con la literatura existente. Nuestros resultados muestran que algunos efectos importantes relativos a la producción y percepción de segmentos acentuados pueden pasar inadvertidos en un análisis holístico y acreditan la necesidad de continuar realizando estudios de unidades mínimas para comprender en profundidad los efectos del acento extranjero en la comunicación
Segmental foreign accent
200 p.Tradicionalmente, el acento extranjero se ha estudiado desde una perspectiva holística, es decir, tratándolo como un todo en lugar de como una serie de rasgos individuales que suceden simultáneamente. Los estudios previos que se han centrado en alguno de estos rasgos individuales lo han hecho generalmente en el plano suprasegmental (Tajima et al., 1997, Munro & Derwing, 2001, Hahn, 2004, etc.). En esta tesis se lleva a cabo un análisis del acento extranjero desde un punto de vista segmental. Considerando que no existe mucha investigación en este campo, nuestro principal objetivo es averiguar si los resultados de estudios holísticos previos pueden ser extrapolados al nivel segmental. Con el objetivo de analizar el nivel segmental en detalle, en esta tesis se presentan técnicas que hacen uso de nuevas tecnologías. Para recabar la mayor información posible, los experimentos perceptivos son llevados a cabo con oyentes con muy distintos perfiles lingüísticos en términos de primera lengua o conocimiento de la segunda lengua y comparados con la literatura existente. Nuestros resultados muestran que algunos efectos importantes relativos a la producción y percepción de segmentos acentuados pueden pasar inadvertidos en un análisis holístico y acreditan la necesidad de continuar realizando estudios de unidades mínimas para comprender en profundidad los efectos del acento extranjero en la comunicación
A study on reusing resources of speech synthesis for closely-related languages
This thesis describes research on building a text-to-speech (TTS) framework that can accommodate the lack of linguistic information of under-resource languages by using existing resources from another language. It describes the adaptation process required when such limited resource is used. The main natural languages involved in this research are Malay and Iban language.
The thesis includes a study on grapheme to phoneme mapping and the substitution of phonemes. A set of substitution matrices is presented which show the phoneme confusion in term of perception among respondents. The experiments conducted study the intelligibility as well as perception based on context of utterances.
The study on the phonetic prosody is then presented and compared to the Klatt duration model. This is to find the similarities of cross language duration model if one exists. Then a comparative study of Iban native speaker with an Iban polyglot TTS using Malay resources is presented. This is to confirm that the prosody of Malay can be used to generate Iban synthesised speech.
The central hypothesis of this thesis is that by using a closely-related language resource, a natural sounding speech can be produced. The aim of this research was to show that by sticking to the indigenous language characteristics, it is possible to build a polyglot synthesised speech system even with insufficient speech resources
Investigating the build-up of precedence effect using reflection masking
The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels
Recommended from our members
Speech rhythm: the language-specific integration of pitch and duration
Experimental phonetic research on speech rhythm seems to have reached an impasse. Recently, this research field has tended to investigate produced (rather than perceived) rhythm, focussing on timing, i.e. duration as an acoustic cue, and has not considered that rhythm perception might be influenced by native language. Yet evidence from other areas of phonetics, and other disciplines, suggests that an investigation of rhythm is needed which (i) focuses on listeners’ perception, (ii) acknowledges the role of several acoustic cues, and (iii) explores whether the relative significance of these cues differs between languages. This thesis, the originality of which derives from its adoption of these three perspectives combined, indicates new directions for progress. A series of perceptual experiments investigated the interaction of duration and f0 as perceptual cues to prosody in languages with different prosodic structures – Swiss German, Swiss French, and French (i.e. from France). The first experiment demonstrated that a dynamic f0 increases perceived syllable duration in contextually isolated pairs of monosyllables, for all three language groups. The second experiment found that dynamic f0 and increased duration interact as cues to rhythmic groups in series of monosyllabic digits and letters; the two cues were significantly more effective than one when heard simultaneously, but significantly less effective than one when heard in conflicting positions around the rhythmic-group boundary location, and native language influenced whether f0 or duration was the more effective cue.
These two experiments laid the basis for the third, which directly addressed rhythm. Listeners were asked to judge the rhythmicality of sentences with systematic duration and f0 manipulations; the results provide evidence that duration and f0 are interdependent cues in rhythm perception, and that the weighting of each cue varies in different languages. A fourth experiment applied the perceptual results to production data, to develop a rhythm metric which captures the multi-dimensional and language-specific nature of perceived rhythm in speech production. These findings have the important implication that if future phonetic research on rhythm follows these new perspectives, it may circumvent the impasse and advance our knowledge and model of speech rhythm.This work was funded by an AHRC doctoral award to the author
Recommended from our members
Perceptual learning of context-sensitive phonetic detail
[Abstract abbreviated due to inability of DSpace@Cambridge to display phonetic symbols. Please see the full abstract in the attached pdf file.]
Although familiarity with a talker or accent is known to facilitate perception, it is not clear what underlies this phenomenon. Previous research has focused primarily on whether listeners can learn to associate novel phonetic characteristics with low-level units such as features or phonemes. However, this neglects the potential role of phonetic information at many other levels of representation. To address this shortcoming, this thesis investigated perceptual learning of systematic phonetic detail relating to higher levels of linguistic structure, including prosodic, grammatical and morphological contexts. Furthermore, in contrast to many previous studies, this research used relatively natural stimuli and tasks, thus maximising its relevance to perceptual learning in ordinary listening situations.
This research shows that listeners can update their phonetic representations in response to incoming information and its relation to linguistic-structural context. In addition, certain patterns of systematic phonetic detail were more learnable than others. These findings are used to inform an account of how new information is integrated with prior experience in speech processing, within a framework that emphasises the importance of phonetic detail at multiple levels of representation.This work was funded by an AHRC grant
The effects of English proficiency on the processing of Bulgarian-accented English by Bulgarian-English bilinguals
This dissertation explores the potential benefit of listening to and with one’s first-language accent, as suggested by the Interspeech Intelligibility Benefit Hypothesis (ISIB). Previous studies have not consistently supported this hypothesis. According to major second language learning theories, the listener’s second language proficiency determines the extent to which the listener relies on their first language phonetics. Hence, this thesis provides a novel approach by focusing on the role of English proficiency in the understanding of Bulgarian-accented English for Bulgarian-English bilinguals.
The first experiment investigated whether evoking the listeners’ L1 Bulgarian phonetics would improve the speed of processing Bulgarian-accented English words, compared to Standard British English words, and vice versa. Listeners with lower English proficiency processed Bulgarian-accented English faster than SBE, while high proficiency listeners tended to have an advantage with SBE over Bulgarian accent.
The second experiment measured the accuracy and reaction times (RT) in a lexical decision task with single-word stimuli produced by two L1 English speakers and two Bulgarian-English bilinguals. Listeners with high proficiency in English responded slower and less accurately to Bulgarian-accented speech compared to L1 English speech and compared to lower proficiency listeners. These accent preferences were also supported by the listener’s RT adaptation across the first experimental block.
A follow-up investigation compared the results of L1 UK English listeners to the bilingual listeners with the highest proficiency in English. The L1 English listeners and the bilinguals processed both accents with similar speed, accuracy and adaptation patterns, showing no advantage or disadvantage for the bilinguals.
These studies support existing models of second language phonetics. Higher proficiency in L2 is associated with lesser reliance on L1 phonetics during speech processing. In addition, the listeners with the highest English proficiency had no advantage when understanding Bulgarian-accented English compared to L1 English listeners, contrary to ISIB.
Keywords:
Bulgarian-English bilinguals, bilingual speech processing, L2 phonetic development, lexical decision, proficienc
- …