8 research outputs found
Recommended from our members
Cue Integration and Contrast Shifts: Experimental and Typological Studies
Auditory Enhancement has been put forth as an explanation for why certain acoustic phonetic cues co-vary to signal phonological contrasts more often than others. Under this account, listeners more readily associate two cues if they produce the same auditory effect, making the cues perceptually inseparable. Traditionally, evidence for enhancement has come from studies showing perceptual integration between enhancing cues, but even cues that do not share the same auditory effect have been shown to perceptually integrate. Further, language experience with co-variation between cues is often a confound in these studies.In this dissertation, I present new evidence in favour of auditory enhancement from four experiments and one typological study.In the first set of experiments, I use a modified cue weighting paradigm that mimics diachronic contrast shifts. Listeners categorizing synthesized speech stimuli were forced to shift their attention between a pair of acoustic cues based on how informative each cue was to the contrast. This was done for a pair of enhancing cues, pitch and breathiness, and a pair of non-enhancing cues, pitch and vowel duration, both of which have been shown to perceptually integrate. For each pair of cues, I tested two groups of listeners – English listeners, who had no phonemic experience with either cue pair, and Hani (Tibeto-Burman) listeners who had experience with both pairs of cues co-varying in the same contrast. The extent to which listeners were able to shift attention between non-enhancing cues was predicted to reflect their language experience. For enhancing cues, attentional shift was predicted to also be conditioned by whether the cues were in an enhancing relationship. These predictions were borne out, but there was an unpredicted finding that shifting between the enhancing cues was asymmetric.This asymmetry was further explored in two experiments. The first of these investigated whether the asymmetry could be caused by both listener groups having more linguistic experience with pitch than with breathiness. Two additional groups of listeners were thus tested using the same paradigm: Tone listeners, who used pitch phonemically, and Phonation listeners, who used breathiness phonemically. Both of these groups also exhibited the same asymmetry, showing that the phenomenon is language-general.In the final experiment, I tested the hypothesis that the asymmetry in attentional shift was caused by an asymmetric perceptual dependency between pitch and breathiness. Listeners categorized stimuli for which one cue was informative but the other was completely neutralized. The amount of attention listeners paid to the uninformative cue was predicted to differ if the percept of one cue was dependent on the other but not vice versa. Results from this experiment provided weak evidence in favour of the hypothesis. Finally, I conducted a cross-linguistic typological survey of the synchronic co-variation and diachronic contrast transfer between the cue pairs I tested experimentally. While the cues in both pairs co-vary synchronically, only the enhancing cues participate in contrast transfer. Furthermore, the transfer of phonological contrast between the enhancing cues occurs overwhelmingly in the direction that matches the asymmetry in attentional shift observed in the lab. The experimental and typological studies in this dissertation provide support for Auditory Enhancement, demonstrating that cues that converge on the same auditory effect are treated differently by listeners compared to cues that do not. Based on the results, I argue that i) auditory enhancement and perceptual integration should remain separate notions, and ii) perceptual associations that are not learned through experience may be asymmetric, but learned associations are necessarily symmetric
Pitch and spectral analysis of speech based on an auditory synchrony model
Also issued as Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1985.Includes bibliographical references (p. 228-235).Supported in part by the National Institutes of Health. 5 T32 NS07040Stephanie Seneff
Using naive listener imitations of native speaker productions to investigate mechanisms of listener-based sound change
This study was designed to test whether listener-based sound change-listener misperception (Ohala, 1981, 1993) and perceptual cue re-weighting (Beddor, 2009, 2012)-can be observed synchronically in a laboratory setting. Co-registered articulatory data (degree of nasalization, tongue height, breathiness) and acoustic data (F1 frequency) related to the productions of phonemic oral and nasal vowels of Southern French were first collected from four native speakers, and the acoustic recordings were subsequently presented to nine Australian English naive listeners, who were instructed to imitate the native productions. During these imitations, similar articulatory and acoustic data were collected in order to compare the articulatory strategies used by the two groups. The results suggest that the imitators successfully reproduced the acoustic distinctions made by the native speakers, but that they did so using different articulatory strategies. The articulatory strategies for the vowel pair /a/-/a/ suggest that listeners (at least partially) misperceived F1-lowering due to nasalization and breathiness as being due to tongue height. Additional evidence supports perceptual cue re-weighting, in that the naive imitators employed nasalance less, and tongue height more, in order to obtain the same F1 nasal-oral distinctions that the native speakers had originally produced
Information density and phonetic structure: Explaining segmental variability
There is growing evidence that information-theoretic principles influence linguistic structures. Regarding speech several studies have found that phonetic structures lengthen in duration and strengthen in their spectral features when they are difficult to predict from their context, whereas easily predictable phonetic structures are shortened and reduced spectrally. Most of this evidence comes from studies on American English, only some studies have shown similar tendencies in Dutch, Finnish, or Russian. In this context, the Smooth Signal Redundancy hypothesis (Aylett and Turk 2004, Aylett and Turk 2006) emerged claiming that the effect of information-theoretic factors on the segmental structure is moderated through the prosodic structure. In this thesis, we investigate the impact and interaction of information density and prosodic structure on segmental variability in production analyses, mainly based on German read speech, and also listeners' perception of differences in phonetic detail caused by predictability effects. Information density (ID) is defined as contextual predictability or surprisal (S(unit_i) = -log2 P(unit_i|context)) and estimated from language models based on large text corpora. In addition to surprisal, we include word frequency, and prosodic factors, such as primary lexical stress, prosodic boundary, and articulation rate, as predictors of segmental variability in our statistical analysis. As acoustic-phonetic measures, we investigate segment duration and deletion, voice onset time (VOT), vowel dispersion, global spectral characteristics of vowels, dynamic formant measures and voice quality metrics. Vowel dispersion is analyzed in the context of German learners' speech and in a cross-linguistic study. As results, we replicate previous findings of reduced segment duration (and VOT), higher likelihood to delete, and less vowel dispersion for easily predictable segments. Easily predictable German vowels have less formant change in their vowel section length (VSL), F1 slope and velocity, are less curved in their F2, and show increased breathiness values in cepstral peak prominence (smoothed) than vowels that are difficult to predict from their context. Results for word frequency show similar tendencies: German segments in high-frequency words are shorter, more likely to delete, less dispersed, and show less magnitude in formant change, less F2 curvature, as well as less harmonic richness in open quotient smoothed than German segments in low-frequency words. These effects are found even though we control for the expected and much more effective effects of stress, boundary, and speech rate. In the cross-linguistic analysis of vowel dispersion, the effect of ID is robust across almost all of the six languages and the three intended speech rates. Surprisal does not affect vowel dispersion of non-native German speakers. Surprisal and prosodic factors interact in explaining segmental variability. Especially, stress and surprisal complement each other in their positive effect on segment duration, vowel dispersion and magnitude in formant change. Regarding perception we observe that listeners are sensitive to differences in phonetic detail stemming from high and low surprisal contexts for the same lexical target.Informationstheoretische Faktoren beeinflussen die Variabilität gesprochener Sprache. Phonetische Strukturen sind länger und zeigen erhöhte spektrale Distinktivität, wenn sie aufgrund ihres Kontextes leicht vorhersagbar sind als Strukturen, die schwer vorhersagbar sind. Die meisten Studien beruhen auf Daten aus dem amerikanischen Englisch. Nur wenige betonen die Notwendigkeit für mehr sprachliche Diversität. Als Resultat dieser Erkenntnisse haben Aylett und Turk (2004, 2006) die Smooth Signal Redundancy Hypothese aufgestellt, die besagt, dass der Effekt von Vorhersagbarkeit auf phonetische Strukturen nicht direkt, sondern nur die prosodische Struktur umgesetzt wird. In dieser Arbeit werden der Einfluss und die Interaktion von Informationsdichte und prosodischen Strukturen auf segmentelle Variabilität im Deutschen sowie die Wahrnehmungsfähigkeit von Unterschieden im phonetischen Detail aufgrund ihrer Vorhersagbarkeit untersucht. Informationsdichte (ID) wird definiert als kontextuelle Vorhersagbarkeit oder Surprisal (S(unit_i) = -log2 P(unit_i|context)). Zusätzlich zu Surprisal verwenden wir auch Wortfrequenz und prosodische Faktoren, wie primäre Wortbetonung, prosodische Grenze und Sprechgeschwindigkeit als Variablen in der statistischen Analyse. Akustisch-phonetische Maße sind Segmentlänge und -löschung, voice onset time (VOT), Vokaldispersion, globale und dynamische vokalische Eigenschaften und Stimmqualität. Vokaldispersion wird nicht nur im Deutschen, sondern auch in einer sprachübergreifenden Analyse und im Kontext von L2 untersucht. Wir können vorherige Ergebnisse, die auf dem Amerikanischen beruhten, für das Deutsche replizieren. Reduzierte Segmentlänge und VOT, höhere Wahrscheinlichkeit der Löschung und geringere Vokaldispersion werden auch für leicht vorhersagbare Segmente im Deutschen beobachtet. Diese zeigen auch weniger Formantenbewegung, reduzierte Kurvigkeit in F2 sowie erhöhte Behauchtheitswerte als Vokale, die schwer vorhersagbar sind. Die Ergebnisse für Wortfrequenz zeigen ähnliche Tendenzen: Deutsche Segmente in hochfrequenten Wörtern sind kürzer, werden eher gelöscht, zeigen reduzierte Werte für Vokaldispersion, Formantenbewegungen und Periodizität als deutsche Segmente in Wörtern mit geringer Frequenz. Obwohl wir bekannte Effekte für Betonung, Grenze und Tempo auf segmentelle Variabilität in den Modellen beobachten, sind die Effekte von ID signifikant. Die sprachübergreifende Analyse zeigt zudem, dass diese Effekte auch robust für die meisten der untersuchten Sprachen sind und sich in allen intendierten Sprechgeschwindigkeiten zeigen. Surprisal hat allerdings keinen Einfluss auf die Vokaldispersion von Sprachlernern. Des weiteren finden wir Interaktionseffekte zwischen Surprisal und den prosodischen Faktoren. Besonders für Wortbetonung lässt sich ein stabiler positiver Interaktionseffekt mit Surprisal feststellen. In der Perzeption sind Hörer durchaus in der Lage, Unterschiede zwischen manipulierten und nicht manipulierten Stimuli zu erkennen, wenn die Manipulation lediglich im phonetischen Detail des Zielwortes aufgrund von Vorhersagbarkeit besteht