7 research outputs found

    Perception of pitch in glottalizations of varying duration by German listeners

    Get PDF
    Previous studies have shown that glottalization is not necessarily perceived as lower pitch but that pitch perception in glottalization can be influenced by the different size of prosodic domains relevant in the native language of the listener. Speakers of intonation languages were influenced by the preceding pitch context when judging the pitch of longer creaky voice stretches, while speakers of pitch-accent or tone languages were not. The current study investigates pitch perception by German listeners in glottalized stretches of speech whose duration varied along a 10-step continuum. We found that the duration of the glottalized stretches affected the categorization of the stimuli, and that the German listeners were not influenced by the preceding pitch context, unlike in a previous study on longer stretches of glottalization of constant duration. Possibly shorter stretches of glottalization are interpreted as segmental word-boundary phenomena rather than as intonation.casl691pub3945pu

    Perception of Glottalization in Varying Pitch Contexts in Mandarin Chinese

    Get PDF
    Although glottalization has often been associated with low pitch, evidence from a number of sources supports the assertion that this association is not obligatory, and is likely to be language-specific. Following a previous study testing perception of glottalization by German, English, and Swedish listeners, the current research investigates the influence of pitch context on the perception of glottalization by native speakers of a tone language, Mandarin Chinese. Listeners heard AXB sets in which they were asked to match glottalized stimuli with pitch contours. We find that Mandarin listeners tend not to be influenced by the pitch context when judging the pitch of glottalized stretches of speech. These data lend support to the idea that the perception of glottalization varies in relation to language-specific prosodic structure.casl[1] Gordon, M. & P. Ladefoged (2001). Phonation types: a crosslinguistic overview. Journalof Phonetics 29: 383-406. [2] Gerratt, B.R. & J. Kreiman (2001). Toward a taxonomy of nonmodal phonation. Journal of Phonetics 29: 365-381. [3] Catford, J.C. (1964). Phonation types: the classification of some laryngeal components of speech production. In: Abercrombie, D. et al. (eds.) In honour of Daniel Jones, London: Longmans, pp. 26-37. [4] Blomgren, M., Y. Chen, M.L. Ng, & H.R. Gilbert (1998). Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. Journal of the Acoustical Society of America 103(5): 2649-2658. [5] Gussenhoven, C. (2004). The phonology of tone and intonation. Cambridge: Cambridge University Press. [6] Pierrehumbert, J. & D. Talkin (1992). Lenition of /h/ and glottal stop. In Papers in Laboratory Phonology II. Cambridge: Cambridge University Press, 90-117. [7] Pierrehumbert, J. (1995). Prosodic effects on glottal allophones. In: Fujimura, O., Hirano, M. (eds.), Vocal fold physiology: voice quality control. Singular Publishing Group, San Diego, pp. 39- 60. [8] Dilley, L., S. Shattuck-Hufnagel, & M. Ostendorf (1996). Glottalization of word-initial vowels as a function of prosodic structure. Journal of Phonetics 24: 423-444. [9] Redi, L. & S. Shattuck-Hufnagel (2001). Variation in the realization of glottalization in normal speakers. Journal of Phonetics 29: 407-429. [10] Henton, C. & A. Bladon (1988). Creak as a socio-phonetic marker. In Hyman, L.M. & C.N. Li (eds.) Language, Speech and Mind: studies in honor of Victoria A. Fromkin. London, pp. 3- 29. [11] Huffman, M.K. (2005). Segmental and prosodic effects on coda glottalization. Journal of Phonetics 33: 335-362. [12] Ogden, R. (2001). Turn transition, creak and glottal stop in Finnish talk-in-interaction. Journal of the International Phonetic Association 31: 139-152. [13] Ogden, R. (2004). Non-modal voice quality and turn-taking in Finnish. In Couper-Kuhlen, E & Ford, C. (eds.) Sound patterns in interaction: cross-linguistic studies from conversation. Amsterdam: John Benjamins, pp. 29-62. [14] Bissiri, M. P., M.L. Lecumberri, M. Cooke & J. Vol_n, (2011). The role of word-initial glottal stops in recognizing English words. Proceedings of Interspeech 2011, Florence, Italy, pp. 165-168. [15] Kohler, K. J. (1994). Glottal stops and glottalization in German. Phonetica 51: 38-51. [16] Ding, H., O. Jokisch & R. Hoffmann (2004). Glottalization in inventory construction: a cross-language study. Proceedings of ISCSLP 2004, Hong Kong, pp. 37-40. [17] Chao, Y.R. (1968). A Grammar of Spoken Chinese. Berkeley, University of California Press. [18] Ding, H. & J. Helbig (1996). Sprecher- und kontextbedingte Varianz des dritten Vokaltones in chinesischen Silben - eine akustische Untersuchung. Proceedings of DAGA 1996, Bonn, Germany, pp. 514-515. [19] Silverman, D. (1997). Laryngeal Complexity in Otomanguean Vowels. Phonology 14: 235-261. [20] Frazier, M. (2008). The interaction of pitch and creaky voice: data from Yucatec Maya and cross-linguistic implications. UBC Working Papers in Linguistics: Proceedings of Workshop on Structure and Constituency in the Languages of the Americas (WSCLA), pp. 112-125. [21] N_ Chasaide, A. & C. Gobl (2004). Voice quality and f0 in prosody: towards a holistic account. Proceedings of the 2nd International Conference on Speech Prosody, Nara, Japan, pp. 189-196. [22] Bissiri, M.P. & M. Zellers (2013). Perception of glottalization in varying pitch contexts across languages. Proceedings of Interspeech 2013, Lyon, France, pp. 253-257. [23] Boersma, P. & D. Weenink (2013). Praat: doing phonetics by computer [Computer program]. Available http://www.praat.org/. [24] Liu, S. & A.G. Samuel (2004). Perception of Mandarin lexical tones when F0 information is neutralized. Language and Speech 47(2): 109-138. [25] Lee, C.-Y., L. Tao & Z.S. Bond (2008). Identification of acoustically modified Mandarin tones by native listeners. Journal of Phonetics 36: 537-563.pub4421pu

    Automatic detection of disfluencies in a corpus of university lectures

    Get PDF
    This dissertation focuses on the identification of disfluent sequences and their distinct structural regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32 hours of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. The best results concern the detection of the interregnum, followed by the detection of the interruption point. Several machine learning methods have been applied, but experiments show that Classification and Regression Trees usually outperform the other methods. The set of most informative features for cross-region identification encompasses word duration ratios, word confidence score, silent ratios, and pitch and energy slopes. Features such as the number of phones and syllables per word proved to be more useful for the identification of the interregnum, whereas energy slopes were most suited for identifying the interruption point. We have also conducted initial experiments on automatic detecting filled pauses, the most frequent disfluency type. For now, only force aligned transcripts were used, since the ASR system is not well adapted to this domain. This study is a step towards automatic detection of filled pauses for European Portuguese using prosodic features. Future work will extend this study for fully automatic transcripts, and will also tackle other domains, also exploring extended sets of linguistic features.Esta tese aborda a identificação de sequências disfluentes e respetivas regiões estruturais. As experiências aqui descritas baseiam-se em segmentação e informação relativa a prosódia, calculadas a partir de um corpus de aulas universitárias em Português Europeu, contendo cerca de 32 horas de fala e de cerca de 7,7% de disfluências. O conjunto de características utilizadas provou ser discriminatório na identificação das regiões contidas na produção de disfluências. Os melhores resultados dizem respeito à deteção do interregnum, seguida da deteção do ponto de interrupção. Foram testados vários métodos de aprendizagem automática, sendo as Árvores de Decisão e Regressão as que geralmente obtiveram os melhores resultados. O conjunto de características mais informativas para a identificação e distinção de regiões disfluentes abrange rácios de duração de palavras, nível de confiança da palavra atual, rácios envolvendo silêncios e declives de pitch e de energia. Características tais como o número de fones e sílabas por palavra provaram ser mais úteis para a identificação do interregnum, enquanto pitch e energia foram os mais adequados para identificar o ponto de interrupção. Foram também realizadas experiências focando a deteção de pausas preenchidas. Por enquanto, para estas experiências foi utilizado apenas material proveniente de alinhamento forçado, já que o sistema de reconhecimento automático não está bem adaptado a este domínio. Este estudo representa um novo passo no sentido da deteção automática de pausas preenchidas para Português Europeu, utilizando recursos prosódicos. Em trabalho futuro pretende-se estender esse estudo para transcrições automáticas e também abordar outros domínios, explorando conjuntos mais extensos de características linguísticas

    On the Appearance of the Comedy LP, 1957–1973

    Full text link
    Many observers of contemporary comedy in the United States during the 1960s referred to musical aspects of extra-musical performances. Comedy LP records furnish important artifacts for the study of the musical appearances these observers produced for themselves. Where contemporaries described appearances characterized by printable words and polemics as “satirical,” the musical appearances discussed in this dissertation can instead be described as “comic”: instead of mocking persons or ideas, they show people and things becoming involved with one another in absurdly triumphant ways. These two different sorts of appearances correspond to two different uses for comedy in a class society, one consolidating a hegemonic middle-class “consensus” against ridiculous adversaries, the other exploring surprising potentials in even the most ridiculous circumstances. A history of antagonistic ways of listening to sixties comedy can be read as a history of the making of class relations in an advanced capitalist society. This dissertation discusses four case studies selected with two complementary aims: to produce an appearance of the comedy LP as a densely varied form and to produce knowledge of the political stakes involved in historical conflicts over formal appearances. In each study, a musical appearance becomes involved in the making of class. The jazz critic Nat Hentoff insisted on musical appearances of the iconic sixties comedian Lenny Bruce over and against what he derided as “liberal” readings for printable messages. His chief artifacts were comedy LP records. Elaine May and Mike Nichols—television stars, dinner club sensations, and luminaries of the most popularly influential improvisatory theater in the United States—used a tangled musical texture associated with affluent social circles. By invoking descriptions of the self as she might have found them in her widely reported readings of Freud, May seems to undermine the ethical significance of the tangled texture as previously determined by Katharine Hepburn’s films. The “blue record” or “party record” produced by and for black Americans in the 1970s was advertised in middle-class periodicals as a genre characterized by “dirty words.” But Tramp Time Volume 1 (La Val LVP 901, 1967), a purportedly early example of the party record featuring an itinerant Midwestern performer named Jimmy “Mr. Motion” Lynch, instead seems characterized most importantly by features of blues music. The Firesign Theatre, a Californian comedy troupe popular with the “dormitory debauchee set,” performed a peculiar involvement in history using a quasi-musical style based upon the characteristics of radio as a broadcast medium. This radiophonic style places observers “inside” history after the perceived closures of 1968. Art-critical, archival, and philological methods shape this dissertation’s argument. Formalistic descriptions based upon vocabularies critically adapted from modern and contemporary writings produce “abstract” appearances. Artifacts collected through archival research ground these abstract appearances as “historically possible appearances.” As a formalism, this historical method uses its thickening self-referential vocabulary to invent its own critical universe. As a historical method, this formalism produces knowledge of appearances which, because they are grounded in activities, leave no self-contained artifacts

    Examining the Linguistic Ideology Throaty Sounds Are Bad for Performers : The History of Negative Attitudes Towards Glottal Stops and Laryngealization in English

    Full text link
    This thesis analyzes explicit metadiscourse (Johnstone et al 2006) on throaty sounds, primarily focused on glottal segments and non-modal constricted voice quality in English. Authors contributing to this metadiscourse are argued to be an offshoot of the speech chain network which valorized and circulated the English accent known as RP or Received Pronunciation, studied by Agha (2003). The evaluated texts center on English-speaking elocution, singing training, voice, speech, and voice care. The analysis shows glottal and guttural articulations are framed negatively and often discouraged by appeals to both health and aesthetics. Many authors in this performance speech chain network assert a linguistic ideology in the form of a belief mediating between language use and social structure: throaty sounds are bad for performers. However, as glottal stops and other laryngeal sounds are basic and naturally occurring consonants in many of the world’s languages, there is counter evidence to the view of them as problematic, injurious, or aberrant. Instead, it is theorized here that the negative outlook on throaty sounds is more deeply tied to historical and current-day social evaluation of stigmatized speakers of English who use salient throaty sounds, notably via associations with class, gender, and racialization. Negative material effects stem from this linguistic ideology. This research raises questions about the cultural framing of vocal health and the iconicity of voice quality

    Analysis of nonmodal glottal event patterns with application to automatic speaker recognition

    Get PDF
    Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographical references (p. 211-215).Regions of phonation exhibiting nonmodal characteristics are likely to contain information about speaker identity, language, dialect, and vocal-fold health. As a basis for testing such dependencies, we develop a representation of patterns in the relative timing and height of nonmodal glottal pulses. To extract the timing and height of candidate pulses, we investigate a variety of inverse-filtering schemes including maximum-entropy deconvolution that minimizes predictability of a signal and minimum-entropy deconvolution that maximizes pulse-likeness. Hybrid formulations of these methods are also considered. we then derive a theoretical framework for understanding frequency- and time-domain properties of a pulse sequence, a process that sheds light on the transformation of nonmodal pulse trains into useful parameters. In the frequency domain, we introduce the first comprehensive mathematical derivation of the effect of deterministic and stochastic source perturbation on the short-time spectrum. We also propose a pitch representation of nonmodality that provides an alternative viewpoint on the frequency content that does not rely on Fourier bases. In developing time-domain properties, we use projected low-dimensional histograms of feature vectors derived from pulse timing and height parameters. For these features, we have found clusters of distinct pulse patterns, reflecting a wide variety of glottal-pulse phenomena including near-modal phonation, shimmer and jitter, diplophonia and triplophonia, and aperiodicity. Using temporal relationships between successive feature vectors, an algorithm by which to separate these different classes of glottal-pulse characteristics has also been developed.(cont.) We have used our glottal-pulse-pattern representation to automatically test for one signal dependency: speaker dependence of glottal-pulse sequences. This choice is motivated by differences observed between talkers in our separated feature space. Using an automatic speaker verification experiment, we investigate tradeoffs in speaker dependency for short-time pulse patterns, reflecting local irregularity, as well as long-time patterns related to higher-level cyclic variations. Results, using speakers with a broad array of modal and nonmodal behaviors, indicate a high accuracy in speaker recognition performance, complementary to the use of conventional mel-cepstral features. These results suggest that there is rich structure to the source excitation that provides information about a particular speaker's identity.by Nicolas Malyska.Ph.D
    corecore