11 research outputs found
Contextual effects on the perception of duration
Abstract In the experiments reported here, listeners categorized and discriminated speech and non-speech analogue stimuli in which the durations of a vowel and a following consonant or their analogues were varied orthogonally. The listeners' native languages differed in how these durations covary in speakers' productions of such sequences. Because auditorist and autonomous models of speech perception hypothesize that the auditory qualities evoked by both kinds of stimuli determine their initial perceptual evaluation, they both predict that listeners from all the languages will respond similarly to non-speech analogues as they do to speech in both tasks. Because neither direct realist nor interactive models hypothesize such a processing stage, they predict instead that in the way in which vowel and consonant duration covary in the listeners' native languages will determine how they categorize and discriminate the speech stimuli, and that all listeners will categorize and discriminate the non-speech differently from the speech stimuli. Listeners' categorization of the speech stimuli did differ as a function of how these durations covary in their native languages, but all listeners discriminated the speech stimuli in the same way, and they all categorized and discriminated the non-speech stimuli in the same way, too. These similarities could arise from listeners adding the durations of the vowel and consonant intervals (or their analogues) in these tasks with these stimuli; they do so when linguistic experience does not influence them to perceive these durations otherwise. These results support an autonomous rather than interactive model in which listeners either add or apply their linguistic experience at a post-perceptual stage of processing. They do not however support an auditorist over a direct realist model because they provide no evidence that the signal's acoustic properties are transformed during the hypothesized prior perceptual stage.
Acquisition of Japanese quantity contrasts by L1 Cantonese speakers
This paper explores the acquisition of Japanese vowel and consonant quantity contrasts by Cantonese learners. Our goal is to examine whether transfer from L1 is possible when L1 experience is phonemic but restricted to a small set of sounds (short vs. long vowels) and when the experience is non-phonemic, derived only at morpheme boundaries (short vs. long consonants). We recruited 20 Cantonese learners (beginner and advanced learners) and 5 native speakers of Japanese, who produced target stimuli varying in consonant and vowel quantity framed in a carrier sentence. The resultant data were converted into several durational ratios for analyses. Results showed that both the beginners and advanced learners were able to distinguish between short vs. long vowels and consonants in Japanese, but only the native speakers enhanced the contrasts in slower speech. It was also found that in most cases the learners were able to lengthen the vowel before a geminate (i.e. long consonant), a secondary cue to Japanese consonant quantity known to be rare across languages. These results are discussed in terms of current theories of second language acquisition.postprin
Speaking rate normalization across different talkers in the perception of Japanese stop and vowel length contrasts
7 pagesPerception of duration is critically influenced by the speaking rate of the surrounding context. However, to what
extent this speaking rate normalization is talker-specific is understudied. This experiment investigated whether Japanese listeners’
perception of temporally contrastive phonemes is influenced by the speaking rate of the surrounding context, and
more importantly, whether the effect of the contextual speaking rate persists across different talkers for different types of contrasts:
a singleton-geminate stop contrast and short-long vowel contrast in Japanese. The results suggest that listeners generalized
their rate-based adjustments to different talkers’ speech regardless of whether the target contrasts depended on silent
closure duration or vowel duration. Our results thus support the view that speaking rate normalization is an obligatory process
that happens in the early phase of perception
Not all geminates are created equal : evidence from Maltese glottal consonants
Many languages distinguish short and long consonants or singletons and geminates. At a phonetic level, research has established that duration is the main cue to such distinctions but that other, sometimes language-specific, cues contribute to the distinction as well. Different proposals for representing geminates share one assumption: The difference between a singleton and a geminate is relatively uniform for all consonants in a given language. In this paper, Maltese glottal consonants are shown to challenge this view. In production, secondary cues, such as the amount of voicing during closure and the spectral properties of frication noises, are stronger for glottal consonants than for oral ones, and, in perception, the role of secondary cues and duration also varies across consonants. Contrary to the assumption that gemination is a uniform process in a given language, the results show that the relative role of secondary cues and duration may differ across consonants and that gemination may involve language-specific phonetic knowledge that is specific to each consonant. These results question the idea that lexical access in speech processing can be achieved through features.peer-reviewe
Compensation for complete assimilation in speech perception: The case of Korean labial-to-velar assimilation
In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., 'garden bench'→ "gardem bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a pronunciation of "garden" that carries cues for both a labial [m] and an alveolar [n]). In the current paper, we show that a similar context effect is observed for an assimilation that is often complete, Korean labial-to-velar place assimilation. In contrast to the context effects for partial assimilations, however, the context effects seem to rely completely on listeners' experience with the assimilation pattern in their native language
Reconstructing Phonological Change: Duration and Syllable Structure in Latin Vowel Reduction
During the fixed initial-stress period of Latin (sixth to fifth centuries BC), internal open syllable vowels were totally neutralised, usually raising to /i/ (*per.fa.ki.oː>perficiō ‘I complete’), whereas in closed syllables /a/ was raised to /e/, but the other vowels remained distinct (*per.fak.tos>perfectus ‘completed’). Miller (1972) explains closed syllable resistance by positing internal secondary stress on closed syllables. However, evidence from vowel reduction and syncope suggest that internal syllables never bore stress in early archaic times. A typologically unusual alternative is proposed: contrary to the pattern normally found (Maddieson 1985), vowels had longer duration in closed syllables than in open syllables, as in Turkish and Finnish, thus permitting speakers to attain the targets for non-high vowels in closed syllables. This durational pattern is manifested not only in vowel reduction, but also in the quantitative changes seen in ‘classical’ and ‘inverse’ compensatory lengthenings, the development CVːCV > CVC and ‘superheavy’ degemination (VːCCV > VːCV)
Emergent consonantal quantity contrast and context-dependence of gestural phasing
Embodied Task Dynamics is a modeling platform combining task dynamical implementation of articulatory phonology with an optimization approach based on adjustable trade-offs between production efficiency and perception efficacy. Within this platform we model a consonantal quantity contrast in bilabial stops as emerging from local adjustment of demands on relative prominence of the consonantal gesture conceptualized in terms of closure duration. The contrast is manifested in the form of two distinct, stable inter-gestural coordination patterns characterized by quantitative differences in relative phasing between the consonant and the coproduced vocalic gesture. Furthermore, the model generates a set of qualitative predictions regarding dependence of kinematic characteristics and inter-gestural coordination on consonant quantity and gestural context. To evaluate these predictions, we collected articulatory data for Finnish speakers uttering singletons and geminates in the same context as explored by the model. Statistical analysis of the data shows strong agreement with model predictions. This result provides support for the hypothesis that speech articulation is guided by efficiency principles that underlie many other types of embodied skilled action.Peer reviewe
Recommended from our members
Speech rhythm: the language-specific integration of pitch and duration
Experimental phonetic research on speech rhythm seems to have reached an impasse. Recently, this research field has tended to investigate produced (rather than perceived) rhythm, focussing on timing, i.e. duration as an acoustic cue, and has not considered that rhythm perception might be influenced by native language. Yet evidence from other areas of phonetics, and other disciplines, suggests that an investigation of rhythm is needed which (i) focuses on listeners’ perception, (ii) acknowledges the role of several acoustic cues, and (iii) explores whether the relative significance of these cues differs between languages. This thesis, the originality of which derives from its adoption of these three perspectives combined, indicates new directions for progress. A series of perceptual experiments investigated the interaction of duration and f0 as perceptual cues to prosody in languages with different prosodic structures – Swiss German, Swiss French, and French (i.e. from France). The first experiment demonstrated that a dynamic f0 increases perceived syllable duration in contextually isolated pairs of monosyllables, for all three language groups. The second experiment found that dynamic f0 and increased duration interact as cues to rhythmic groups in series of monosyllabic digits and letters; the two cues were significantly more effective than one when heard simultaneously, but significantly less effective than one when heard in conflicting positions around the rhythmic-group boundary location, and native language influenced whether f0 or duration was the more effective cue.
These two experiments laid the basis for the third, which directly addressed rhythm. Listeners were asked to judge the rhythmicality of sentences with systematic duration and f0 manipulations; the results provide evidence that duration and f0 are interdependent cues in rhythm perception, and that the weighting of each cue varies in different languages. A fourth experiment applied the perceptual results to production data, to develop a rhythm metric which captures the multi-dimensional and language-specific nature of perceived rhythm in speech production. These findings have the important implication that if future phonetic research on rhythm follows these new perspectives, it may circumvent the impasse and advance our knowledge and model of speech rhythm.This work was funded by an AHRC doctoral award to the author
Recommended from our members
Preferential early attribution in segmental parsing
This dissertation investigates parsing in segmental perception, or the process by which listeners map the continuous acoustic signal that reaches their ears to the linguistic representations over which phonology operates. It addresses questions of when listeners decide that they have heard acoustic evidence about the identity of one speech sound, versus evidence about the identity of a following sound, and when this linguistic knowledge is applied relative to when it is received during the course of on-line perception and processing. The central argument advanced here is that the beginnings of answers to these questions require the recognition of a domain-general perceptual bias to continue attributing incrementally-received input to a previously-recognized event, rather than posit that first event\u27s completion and the beginning of a second event before it is necessary to do so. An outline of a new model of general segmental perception that includes this bias is then advanced. This approach has implications for our understanding of the evolution of the typology of the world\u27s languages, in particular for the ways that the acoustic qualities of cues to phonological contrast can determine which potential processes are or are not phonologized