4,587 research outputs found

    Temporal Variability and Stability in Infant-Directed Sung Speech: Evidence for Language-specific Patterns.

    Get PDF
    In this paper, sung speech is used as a methodological tool to explore temporal variability in the timing of word-internal consonants and vowels. It is hypothesized that temporal variability/stability becomes clearer under the varying rhythmical conditions induced by song. This is explored crosslinguistically in German – a language that exhibits a potential vocalic quantity distinction – and the non-quantity languages French and Russian. Songs by non-professional singers, i.e. parents that sang to their infants aged 2 to 13 months in a non-laboratory setting, were recorded and analyzed. Vowel and consonant durations at syllable contacts of trochaic word types with ©CVCV or ©CVːCV structure were measured under varying rhythmical conditions. Evidence is provided that in German non-professional singing, the two syllable structures can be differentiated by two distinct temporal variability patterns: vocalic variability (and consonantal stability) was found to be dominant in ©CVːCV structures whereas consonantal variability (and vocalic stability) was characteristic for ©CVCV structures. In French and Russian, however, only vocalic variability seemed to apply. Additionally, findings suggest that the different temporal patterns found in German were also supported by the stability pattern at the tonal level. These results point to subtle (supra) segmental timing mechanisms in sung speech that affect temporal targets according to the specific prosodic nature of the language in question

    The phonetics and phonology of some syllabic consonants in southern british english

    Get PDF
    This article presents new experimental data on the phonetics of syllabic /l/ and syllabic /n/ in Southern British English and then proposes a new phonological account of their behaviour. Previous analyses (Chomsky and Halle 1968:354, Gimson 1989, Gussmann 1991 and Wells 1995) have proposed that syllabic /l/ and syllabic /n/ should be analysed in a uniform manner. Data presented here, however, shows that syllabic /l/ and syllabic /n/ behave in very different ways, and in light of this, a unitary analysis is not justified. Instead, a proposal is made that syllabic /l/ and syllabic /n/ have different phonological structures, and that these different phonological structures explain their different phonetic behaviours. This article is organised as follows: First a general background is given to the phenomenon of syllabic consonants both cross linguistically and specifically in Southern British English. In §3 a set of experiments designed to elicit syllabic consonants are described and in §4 the results of these experiments are presented. §5 contains a discussion on data published by earlier authors concerning syllabic consonants in English. In §6 a theoretical phonological framework is set out, and in §7 the results of the experiments are analysed in the light of this framework. In the concluding section, some outstanding issues are addressed and several areas for further research are suggested

    Can ultrasonic doppler help detecting nasality for silent speech interfaces?: An exploratory analysis based on alignement of the doppler signal with velum aperture information from real-time MRI

    Get PDF
    This paper describes an exploratory analysis on the usefulness of the information made available from Ultrasonic Doppler signal data collected from a single speaker, to detect velum movement associated to European Portuguese nasal vowels. This is directly related to the unsolved problem of detecting nasality in silent speech interfaces. The applied procedure uses Real-Time Magnetic Resonance Imaging (RT-MRI), collected from the same speaker providing a method to interpret the reflected ultrasonic data. By ensuring compatible scenario conditions and proper time alignment between the Ultrasonic Doppler signal data and the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of movement under a nasal vowel occurrence. The combination of these two sources revealed a moderate relation between the average energy of frequency bands around the carrier, indicating a probable presence of velum information in the Ultrasonic Doppler signalinfo:eu-repo/semantics/acceptedVersio

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Global and detailed speech representations in early language acquisition

    No full text
    We review data and hypotheses dealing with the mental representations for perceived and produced speech that infants build and use over the course of learning a language. In the early stages of speech perception and vocal production, before the emergence of a receptive or a productive lexicon, the dominant picture emerging from the literature suggests rather non-analytic representations based on units of the size of the syllable: Young children seem to parse speech into syllable-sized units in spite of their ability to detect sound equivalence based on shared phonetic features. Once a productive lexicon has emerged, word form representations are initially rather underspecified phonetically but gradually become more specified with lexical growth, up to the phoneme level. The situation is different for the receptive lexicon, in which phonetic specification for consonants and vowels seem to follow different developmental paths. Consonants in stressed syllables are somewhat well specified already at the first signs of a receptive lexicon, and become even better specified with lexical growth. Vowels seem to follow a different developmental path, with increasing flexibility throughout lexical development. Thus, children come to exhibit a consonant vowel asymmetry in lexical representations, which is clear in adult representations

    Strength of forensic voice comparison evidence from the acoustics of filled pauses

    Get PDF
    This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed. The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (BrĂŒmmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; BrĂŒmmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases

    The perception of English-accented polish – a pilot study

    Get PDF
    ‱Does familiarity with a specific foreign language facilitate the recognition and identification of that accent in foreign-accented Polish

    Strength of forensic voice comparison evidence from the acoustics of filled pauses

    Get PDF
    This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed. The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; ‘dynamic’ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (BrĂŒmmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; BrĂŒmmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases
    • 

    corecore