18 research outputs found

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    An MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/

    Get PDF
    In American English, the liquid sounds /r/ and /l/ are the most articulatorily variable and complex sounds. They can be produced by several distinct types of tongue configurations and are the most troublesome sounds for children and nonnative English-speakers to learn. Better understanding of this many-to-one mapping between articulation and acoustics would be beneficial to other areas such as speech pathology, speaker verification, speech recognition and speech synthesis. In this dissertation, two articulatory configurations for each liquid sound were studied (a "retroflex" /r/ vs. a "bunched" /r/, and a light /l/ vs. a dark /l/). Different from previous work on liquids, finite element analysis has been performed to obtain the acoustic responses of the three-dimensional (3-D) vocal tract models, which are based on volumetric magnetic resonance (MR) imaging. Area function models were derived based on the wave propagation property inside the vocal tract. The retroflex /r/ and the bunched /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5. The results from the formant acoustic sensitivity functions and simple-tube vocal tract models suggested that this F4/F5 difference can be explained largely by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator. For both the retroflex /r/ and the bunched /r/, F4 and F5 (along with F3 for the particular speakers studied in this research) come from the long back cavity. However, these formants are half wavelength resonances for the retroflex /r/, but quarter wavelength resonances for the bunched /r/. While both the dark /l/ and the light /l/ have a linguo-alveolar contact and two lateral channels, they differ in the length of the linguo-alveolar contact and in the presence of the linguopalatal contacts caused by raising the sides of the tongue. Both have similar patterns in F1-F3, but differ in the number and locations of zeros in spectrum. For the dark /l/, only one zero occurs below 6 kHz and it is produced by the cross mode posterior to the linguo-alveolar contact. For the light /l/, three zeros below 6 kHz are produced by the asymmetrical channels, the supralingual cavity and the cross mode posterior to the linguo-alveolar contact. The results from two simple vocal tract models show that the lateral channels have to be asymmetrical with an effective length between 3-6 cm to get a zero in the region of F3-F5. Based on the Buckeye database, the acoustic variability and discriminative power of liquids were studied with the mel-frequency band energy coefficients as acoustic parameter. Analysis of variance shows that the inter-speaker variability of /r/ is larger than any other phonemes except /sh/, /s/ and /zh/. On average, /r/ and /l/ have larger inter-speaker variability than any other broad phonetic class. The F-ratio averages of liquids are larger than glides, fricatives, affricates and stops, but smaller than nasals. The speaker identification experiments show that the ranking of the average discriminative power for liquids and other broad phonetic classes is: /r/ > Glides > /l/ > Affricates > Fricatives > Stops > Nasals > Vowels

    Spoken English discrimination (SED) training with multilingual Malaysians: effect of adaptive staircase procedure and background babble in high variability phonetic training.

    Get PDF
    High variability phonetic training (HVPT) has been shown to improve non-native speakers’ perceptual performance in discriminating difficult second language phonemic contrasts (Bradlow, Akahane-Yamada, Pisoni, & Tohkura, 1999; Bradlow, Pisoni, Akahane-Yamada, & Tohkura, 1997; Lively, Logan, & Pisoni, 1993; Lively, Pisoni, Yamada, Tohkura, & Yamada, 1994; Logan, Lively, & Pisoni, 1991). The perceptual learning can be generalized to novel words (Wang & Munro, 2004), novel speakers (Nishi & Kewley-Port, 2007; Richie & Kewley-Port, 2008) and even to speech production (Bradlow et al., 1997). However, the rigidity of the laboratory training settings has limited applications to real life situations. The current thesis examined the effectiveness of a new phonetic training program - the Spoken English Discrimination (SED) training. SED training is a computerized individual training program designed to improve non-native speakers’ bottom-up perceptual sensitivity to discriminate difficult second language (L2) phonemic contrasts. It combines a number of key training features including 1) natural spoken stimuli, 2) highly variable stimuli spoken by multiple speakers, 3) multi-talker babble as background noise and 4) an adaptive staircase procedure that individualizes the level of background babble. The first experiment investigated the potential benefits of different versions of the SED training program. The effect of stimulus variability (single speaker vs. multiple speakers) and design of background babble (constant vs. adaptive staircase) were examined using English voiceless-voiced plosives /t/-/d/ phonemic contrast as the training materials. No improvements were found in the identification accuracy on the /t/-/d/ contrast in post-test, but identification improvements were found on the untrained English /ε/-/æ/ phonemic contrast. The effectiveness of SED training was re-examined in Chapter 3 using the English /ε/-/æ/ phonemic contrast as the training material. Three experiments were conducted to compare the SED training paradigms that had the background babble implemented either at a constant level (Constant SED) or using the adaptive staircase procedure (Adaptive Staircase SED), and the longevity of the training effects. Results revealed that the Adaptive Staircase SED was the more effective paradigm as it generated greater training benefits and its effect generalized better to the untrained /t/-/d/ phonemic contrast. Training effects from both SED paradigms retained six months after the last training section. Before examining whether SED training leads to improvements in speech production, Chapter 4 investigated the phonetics perception pattern of L1 Mandarin Malaysian speakers, L1 Malaysian English speakers and native British English speakers. The production intelligibility of the L1 Mandarin speakers was also evaluated by the L1 Malaysian English speakers and native British English speakers. Single category assimilation was observed in both L1 Mandarin and L1 Malaysian English speakers whereby the /ε/ and /æ/ phonetic sounds were assimilated to a single/æ/ category (Best, McRoberts, & Goodell, 2001). While the British English speakers showed ceiling performance for all phonetic categories involved, the L1 Malaysian English speakers had difficulty identifying the British English /ε/ phoneme and the L1 Mandarin speakers had difficulty identifying the /d/ final, /ε/ and /æ/ phonemes. As seen by their perceptual performance, the L1 Mandarin speakers also had difficulty producing distinct /d/ final, /ε/ and /æ/ phonemes. Two experiments in Chapter 5 examined whether the effects of SED training generalizes to speech production. The results showed that L1 Malaysian English speakers and native British English speakers found different SED paradigms to be more effective in inducing the production improvement. Only the production intelligibility of the /æ/ phoneme improved as a result of SED training. Collectively, the seven experiments in this thesis showed that SED training was effective in improving Malaysian speakers’ perception and production performance of difficult English phonemic contrasts. Further research should be conducted to examine the efficacy of SED training in improving speech perception and production across different training materials and in speakers who come from different language backgrounds

    Spoken English discrimination (SED) training with multilingual Malaysians: effect of adaptive staircase procedure and background babble in high variability phonetic training.

    Get PDF
    High variability phonetic training (HVPT) has been shown to improve non-native speakers’ perceptual performance in discriminating difficult second language phonemic contrasts (Bradlow, Akahane-Yamada, Pisoni, & Tohkura, 1999; Bradlow, Pisoni, Akahane-Yamada, & Tohkura, 1997; Lively, Logan, & Pisoni, 1993; Lively, Pisoni, Yamada, Tohkura, & Yamada, 1994; Logan, Lively, & Pisoni, 1991). The perceptual learning can be generalized to novel words (Wang & Munro, 2004), novel speakers (Nishi & Kewley-Port, 2007; Richie & Kewley-Port, 2008) and even to speech production (Bradlow et al., 1997). However, the rigidity of the laboratory training settings has limited applications to real life situations. The current thesis examined the effectiveness of a new phonetic training program - the Spoken English Discrimination (SED) training. SED training is a computerized individual training program designed to improve non-native speakers’ bottom-up perceptual sensitivity to discriminate difficult second language (L2) phonemic contrasts. It combines a number of key training features including 1) natural spoken stimuli, 2) highly variable stimuli spoken by multiple speakers, 3) multi-talker babble as background noise and 4) an adaptive staircase procedure that individualizes the level of background babble. The first experiment investigated the potential benefits of different versions of the SED training program. The effect of stimulus variability (single speaker vs. multiple speakers) and design of background babble (constant vs. adaptive staircase) were examined using English voiceless-voiced plosives /t/-/d/ phonemic contrast as the training materials. No improvements were found in the identification accuracy on the /t/-/d/ contrast in post-test, but identification improvements were found on the untrained English /ε/-/æ/ phonemic contrast. The effectiveness of SED training was re-examined in Chapter 3 using the English /ε/-/æ/ phonemic contrast as the training material. Three experiments were conducted to compare the SED training paradigms that had the background babble implemented either at a constant level (Constant SED) or using the adaptive staircase procedure (Adaptive Staircase SED), and the longevity of the training effects. Results revealed that the Adaptive Staircase SED was the more effective paradigm as it generated greater training benefits and its effect generalized better to the untrained /t/-/d/ phonemic contrast. Training effects from both SED paradigms retained six months after the last training section. Before examining whether SED training leads to improvements in speech production, Chapter 4 investigated the phonetics perception pattern of L1 Mandarin Malaysian speakers, L1 Malaysian English speakers and native British English speakers. The production intelligibility of the L1 Mandarin speakers was also evaluated by the L1 Malaysian English speakers and native British English speakers. Single category assimilation was observed in both L1 Mandarin and L1 Malaysian English speakers whereby the /ε/ and /æ/ phonetic sounds were assimilated to a single/æ/ category (Best, McRoberts, & Goodell, 2001). While the British English speakers showed ceiling performance for all phonetic categories involved, the L1 Malaysian English speakers had difficulty identifying the British English /ε/ phoneme and the L1 Mandarin speakers had difficulty identifying the /d/ final, /ε/ and /æ/ phonemes. As seen by their perceptual performance, the L1 Mandarin speakers also had difficulty producing distinct /d/ final, /ε/ and /æ/ phonemes. Two experiments in Chapter 5 examined whether the effects of SED training generalizes to speech production. The results showed that L1 Malaysian English speakers and native British English speakers found different SED paradigms to be more effective in inducing the production improvement. Only the production intelligibility of the /æ/ phoneme improved as a result of SED training. Collectively, the seven experiments in this thesis showed that SED training was effective in improving Malaysian speakers’ perception and production performance of difficult English phonemic contrasts. Further research should be conducted to examine the efficacy of SED training in improving speech perception and production across different training materials and in speakers who come from different language backgrounds

    Development of isiXhosa text-to-speech modules to support e-Services in marginalized rural areas

    Get PDF
    Information and Communication Technology (ICT) projects are being initiated and deployed in marginalized areas to help improve the standard of living for community members. This has lead to a new field, which is responsible for information processing and knowledge development in rural areas, called Information and Communication Technology for Development (ICT4D). An ICT4D projects has been implemented in a marginalized area called Dwesa; this is a rural area situated in the wild coast of the former homelandof Transkei, in the Eastern Cape Province of South Africa. In this rural community there are e-Service projects which have been developed and deployed to support the already existent ICT infrastructure. Some of these projects include the e-Commerce platform, e-Judiciary service, e-Health and e-Government portal. Although these projects are deployed in this area, community members face a language and literacy barrier because these services are typically accessed through English textual interfaces. This becomes a challenge because their language of communication is isiXhosa and some of the community members are illiterate. Most of the rural areas consist of illiterate people who cannot read and write isiXhosa but can only speak the language. This problem of illiteracy in rural areas affects both the youth and the elderly. This research seeks to design, develop and implement software modules that can be used to convert isiXhosa text into natural sounding isiXhosa speech. Such an application is called a Text-to-Speech (TTS) system. The main objective of this research is to improve ICT4D eServices’ usability through the development of an isiXhosa Text-to-Speech system. This research is undertaken within the context of Siyakhula Living Lab (SLL), an ICT4D intervention towards improving the lives of rural communities of South Africa in an attempt to bridge the digital divide. Thedeveloped TTS modules were subsequently tested to determine their applicability to improve eServices usability. The results show acceptable levels of usability as having produced audio utterances for the isiXhosa Text-To-Speech system for marginalized areas

    Temporal integration of loudness as a function of level

    Get PDF
    corecore