4,587 research outputs found
Temporal Variability and Stability in Infant-Directed Sung Speech: Evidence for Language-specific Patterns.
In this paper, sung speech is used as a methodological tool to explore temporal variability in the timing of word-internal consonants and vowels. It is hypothesized that temporal variability/stability becomes clearer under the varying rhythmical conditions induced by song. This is explored crosslinguistically in German â a language that exhibits a potential vocalic quantity distinction â and the non-quantity languages French and Russian. Songs by non-professional singers, i.e. parents that sang to their infants aged 2 to 13 months in a non-laboratory setting, were recorded and analyzed. Vowel and consonant durations at syllable contacts of trochaic word types with ŠCVCV or ŠCVËCV structure were measured under varying rhythmical conditions. Evidence is provided that in German non-professional singing, the two syllable structures can be differentiated by two distinct temporal variability patterns: vocalic variability (and consonantal stability) was found to be dominant in ŠCVËCV structures whereas consonantal variability (and vocalic stability) was characteristic for ŠCVCV structures. In French and Russian, however, only vocalic variability seemed to apply. Additionally, findings suggest that the different temporal patterns found in German were also supported by the stability pattern at the tonal level. These results point to subtle (supra) segmental timing mechanisms in sung speech that affect temporal targets according to the specific prosodic nature of the language in question
The phonetics and phonology of some syllabic consonants in southern british english
This article presents new experimental data on the phonetics of syllabic /l/ and syllabic /n/ in Southern British English and then proposes a new phonological account of their behaviour. Previous analyses (Chomsky and Halle 1968:354, Gimson 1989, Gussmann 1991 and Wells 1995) have proposed that syllabic /l/ and syllabic /n/ should be analysed in a uniform manner. Data presented here, however, shows that syllabic /l/ and syllabic /n/ behave in very different ways, and in light of this, a unitary analysis is not justified. Instead, a proposal is made that syllabic /l/ and syllabic /n/ have different phonological structures, and that these different phonological structures explain their different phonetic behaviours.
This article is organised as follows: First a general background is given to the phenomenon of syllabic consonants both cross linguistically and specifically in Southern British English. In §3 a set of experiments designed to elicit syllabic consonants are described and in §4 the results of these experiments are presented. §5 contains a discussion on data published by earlier authors concerning syllabic consonants in English. In §6 a theoretical phonological framework is set out, and in §7 the results of the experiments are analysed in the light of this framework. In the concluding section, some outstanding issues are addressed and several areas for further research are suggested
Can ultrasonic doppler help detecting nasality for silent speech interfaces?: An exploratory analysis based on alignement of the doppler signal with velum aperture information from real-time MRI
This paper describes an exploratory analysis on the usefulness of the information made available from
Ultrasonic Doppler signal data collected from a single speaker, to detect velum movement associated to
European Portuguese nasal vowels. This is directly related to the unsolved problem of detecting nasality in
silent speech interfaces. The applied procedure uses Real-Time Magnetic Resonance Imaging (RT-MRI),
collected from the same speaker providing a method to interpret the reflected ultrasonic data. By ensuring
compatible scenario conditions and proper time alignment between the Ultrasonic Doppler signal data and
the RT-MRI data, we are able to accurately estimate the time when the velum moves and the type of
movement under a nasal vowel occurrence. The combination of these two sources revealed a moderate
relation between the average energy of frequency bands around the carrier, indicating a probable presence
of velum information in the Ultrasonic Doppler signalinfo:eu-repo/semantics/acceptedVersio
Bengali nasal vowels: Lexical representation and listener perception
publishedVersionPaid Open Acces
Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed
The motor theory of speech perception holds that we perceive the speech of
another in terms of a motor representation of that speech. However, when we
have learned to recognize a foreign accent, it seems plausible that recognition
of a word rarely involves reconstruction of the speech gestures of the speaker
rather than the listener. To better assess the motor theory and this
observation, we proceed in three stages. Part 1 places the motor theory of
speech perception in a larger framework based on our earlier models of the
adaptive formation of mirror neurons for grasping, and for viewing extensions
of that mirror system as part of a larger system for neuro-linguistic
processing, augmented by the present consideration of recognizing speech in a
novel accent. Part 2 then offers a novel computational model of how a listener
comes to understand the speech of someone speaking the listener's native
language with a foreign accent. The core tenet of the model is that the
listener uses hypotheses about the word the speaker is currently uttering to
update probabilities linking the sound produced by the speaker to phonemes in
the native language repertoire of the listener. This, on average, improves the
recognition of later words. This model is neutral regarding the nature of the
representations it uses (motor vs. auditory). It serve as a reference point for
the discussion in Part 3, which proposes a dual-stream neuro-linguistic
architecture to revisits claims for and against the motor theory of speech
perception and the relevance of mirror neurons, and extracts some implications
for the reframing of the motor theory
Global and detailed speech representations in early language acquisition
We review data and hypotheses dealing with the mental representations for perceived and produced speech that infants build and use over the course of learning a language. In the early stages of speech perception and vocal production, before the emergence of a receptive or a productive lexicon, the dominant picture emerging from the literature suggests rather non-analytic representations based on units of the size of the syllable: Young children seem to parse speech into syllable-sized units in spite of their ability to detect sound equivalence based on shared phonetic features. Once a productive lexicon has emerged, word form representations are initially rather underspecified phonetically but gradually become more specified with lexical growth, up to the phoneme level. The situation is different for the receptive lexicon, in which phonetic specification for consonants and vowels seem to follow different developmental paths. Consonants in stressed syllables are somewhat well specified already at the first signs of a receptive lexicon, and become even better specified with lexical growth. Vowels seem to follow a different developmental path, with increasing flexibility throughout lexical development. Thus, children come to exhibit a consonant vowel asymmetry in lexical representations, which is clear in adult representations
Strength of forensic voice comparison evidence from the acoustics of filled pauses
This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed. The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; âdynamicâ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (BrĂŒmmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; BrĂŒmmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases
The perception of English-accented polish â a pilot study
âąDoes familiarity with a specific foreign language facilitate the recognition and identification of that accent in foreign-accented Polish
Strength of forensic voice comparison evidence from the acoustics of filled pauses
This study investigates the evidential value of filled pauses (FPs, i.e. um, uh) as variables in forensic voice comparison. FPs for 60 young male speakers of standard southern British English were analysed. The following acoustic properties were analysed: midpoint frequencies of the first three formants in the vocalic portion; âdynamicâ characterisations of formant trajectories (i.e. quadratic polynomial equations fitted to nine measurement points over the entire vowel); vowel duration; and nasal duration for um. Likelihood ratio (LR) scores were computed using the Multivariate Kernel Density formula (MVKD; Aitken and Lucy, 2004) and converted to calibrated log10 LRs (LLRs) using logistic-regression (BrĂŒmmer et al., 2007). System validity was assessed using both equal error rate (EER) and the log LR cost function (Cllr; BrĂŒmmer and du Preez, 2006). The system with the best performance combines dynamic measurements of all three formants with vowel and nasal duration for um, achieving an EER of 4.08% and Cllr of 0.12. In terms of general patterns, um consistently outperformed uh. For um, the formant dynamic systems generated better validity than those based on midpoints, presumably reflecting the additional degree of formant movement in um caused by the transition from vowel to nasal. By contrast, midpoints outperformed dynamics for the more monophthongal uh. Further, the addition of duration (vowel or vowel and nasal) consistently improved system performance. The study supports the view that FPs have excellent potential as variables in forensic voice comparison cases
- âŠ