7 research outputs found
Perception of pitch in glottalizations of varying duration by German listeners
Previous studies have shown that glottalization is not
necessarily perceived as lower pitch but that pitch
perception in glottalization can be influenced by the
different size of prosodic domains relevant in the native
language of the listener. Speakers of intonation
languages were influenced by the preceding pitch
context when judging the pitch of longer creaky
voice stretches, while speakers of pitch-accent or
tone languages were not.
The current study investigates pitch perception
by German listeners in glottalized stretches of
speech whose duration varied along a 10-step continuum.
We found that the duration of the glottalized
stretches affected the categorization of the stimuli,
and that the German listeners were not influenced
by the preceding pitch context, unlike in a previous
study on longer stretches of glottalization of constant
duration. Possibly shorter stretches of glottalization
are interpreted as segmental word-boundary
phenomena rather than as intonation.casl691pub3945pu
Perception of Glottalization in Varying Pitch Contexts in Mandarin Chinese
Although glottalization has often been associated with low
pitch, evidence from a number of sources supports the
assertion that this association is not obligatory, and is likely to
be language-specific. Following a previous study testing
perception of glottalization by German, English, and Swedish
listeners, the current research investigates the influence of
pitch context on the perception of glottalization by native
speakers of a tone language, Mandarin Chinese. Listeners
heard AXB sets in which they were asked to match glottalized
stimuli with pitch contours. We find that Mandarin listeners
tend not to be influenced by the pitch context when judging
the pitch of glottalized stretches of speech. These data lend
support to the idea that the perception of glottalization varies
in relation to language-specific prosodic structure.casl[1] Gordon, M. & P. Ladefoged (2001). Phonation types: a crosslinguistic
overview. Journalof Phonetics 29: 383-406.
[2] Gerratt, B.R. & J. Kreiman (2001). Toward a taxonomy of
nonmodal phonation. Journal of Phonetics 29: 365-381.
[3] Catford, J.C. (1964). Phonation types: the classification of some
laryngeal components of speech production. In: Abercrombie, D.
et al. (eds.) In honour of Daniel Jones, London: Longmans, pp.
26-37.
[4] Blomgren, M., Y. Chen, M.L. Ng, & H.R. Gilbert (1998).
Acoustic, aerodynamic, physiologic, and perceptual properties of
modal and vocal fry registers. Journal of the Acoustical Society
of America 103(5): 2649-2658.
[5] Gussenhoven, C. (2004). The phonology of tone and intonation.
Cambridge: Cambridge University Press.
[6] Pierrehumbert, J. & D. Talkin (1992). Lenition of /h/ and glottal
stop. In Papers in Laboratory Phonology II. Cambridge:
Cambridge University Press, 90-117.
[7] Pierrehumbert, J. (1995). Prosodic effects on glottal allophones.
In: Fujimura, O., Hirano, M. (eds.), Vocal fold physiology: voice
quality control. Singular Publishing Group, San Diego, pp. 39-
60.
[8] Dilley, L., S. Shattuck-Hufnagel, & M. Ostendorf (1996).
Glottalization of word-initial vowels as a function of prosodic
structure. Journal of Phonetics 24: 423-444.
[9] Redi, L. & S. Shattuck-Hufnagel (2001). Variation in the
realization of glottalization in normal speakers. Journal of
Phonetics 29: 407-429.
[10] Henton, C. & A. Bladon (1988). Creak as a socio-phonetic
marker. In Hyman, L.M. & C.N. Li (eds.) Language, Speech and
Mind: studies in honor of Victoria A. Fromkin. London, pp. 3-
29.
[11] Huffman, M.K. (2005). Segmental and prosodic effects on coda
glottalization. Journal of Phonetics 33: 335-362.
[12] Ogden, R. (2001). Turn transition, creak and glottal stop in
Finnish talk-in-interaction. Journal of the International Phonetic
Association 31: 139-152.
[13] Ogden, R. (2004). Non-modal voice quality and turn-taking in
Finnish. In Couper-Kuhlen, E & Ford, C. (eds.) Sound patterns
in interaction: cross-linguistic studies from conversation.
Amsterdam: John Benjamins, pp. 29-62.
[14] Bissiri, M. P., M.L. Lecumberri, M. Cooke & J. Vol_n, (2011).
The role of word-initial glottal stops in recognizing English
words. Proceedings of Interspeech 2011, Florence, Italy, pp.
165-168.
[15] Kohler, K. J. (1994). Glottal stops and glottalization in German.
Phonetica 51: 38-51.
[16] Ding, H., O. Jokisch & R. Hoffmann (2004). Glottalization in
inventory construction: a cross-language study. Proceedings of
ISCSLP 2004, Hong Kong, pp. 37-40.
[17] Chao, Y.R. (1968). A Grammar of Spoken Chinese. Berkeley,
University of California Press.
[18] Ding, H. & J. Helbig (1996). Sprecher- und kontextbedingte
Varianz des dritten Vokaltones in chinesischen Silben - eine
akustische Untersuchung. Proceedings of DAGA 1996, Bonn,
Germany, pp. 514-515.
[19] Silverman, D. (1997). Laryngeal Complexity in Otomanguean
Vowels. Phonology 14: 235-261.
[20] Frazier, M. (2008). The interaction of pitch and creaky voice:
data from Yucatec Maya and cross-linguistic implications. UBC
Working Papers in Linguistics: Proceedings of Workshop on
Structure and Constituency in the Languages of the Americas
(WSCLA), pp. 112-125.
[21] N_ Chasaide, A. & C. Gobl (2004). Voice quality and f0 in
prosody: towards a holistic account. Proceedings of the 2nd
International Conference on Speech Prosody, Nara, Japan, pp.
189-196.
[22] Bissiri, M.P. & M. Zellers (2013). Perception of glottalization in
varying pitch contexts across languages. Proceedings of
Interspeech 2013, Lyon, France, pp. 253-257.
[23] Boersma, P. & D. Weenink (2013). Praat: doing phonetics by
computer [Computer program]. Available http://www.praat.org/.
[24] Liu, S. & A.G. Samuel (2004). Perception of Mandarin lexical
tones when F0 information is neutralized. Language and Speech
47(2): 109-138.
[25] Lee, C.-Y., L. Tao & Z.S. Bond (2008). Identification of
acoustically modified Mandarin tones by native listeners. Journal
of Phonetics 36: 537-563.pub4421pu
Automatic detection of disfluencies in a corpus of university lectures
This dissertation focuses on the identification of disfluent sequences and their distinct structural
regions. Reported experiments are based on audio segmentation and prosodic features, calculated
from a corpus of university lectures in European Portuguese, containing about 32 hours of speech and
about 7.7% of disfluencies.
The set of features automatically extracted from the forced alignment corpus proved to be discriminant
of the regions contained in the production of a disfluency. The best results concern the detection of
the interregnum, followed by the detection of the interruption point. Several machine learning methods
have been applied, but experiments show that Classification and Regression Trees usually outperform
the other methods.
The set of most informative features for cross-region identification encompasses word duration ratios,
word confidence score, silent ratios, and pitch and energy slopes. Features such as the number of
phones and syllables per word proved to be more useful for the identification of the interregnum, whereas
energy slopes were most suited for identifying the interruption point.
We have also conducted initial experiments on automatic detecting filled pauses, the most frequent
disfluency type. For now, only force aligned transcripts were used, since the ASR system is not well
adapted to this domain.
This study is a step towards automatic detection of filled pauses for European Portuguese using
prosodic features. Future work will extend this study for fully automatic transcripts, and will also tackle
other domains, also exploring extended sets of linguistic features.Esta tese aborda a identificação de sequências disfluentes e respetivas regiões estruturais. As
experiências aqui descritas baseiam-se em segmentação e informação relativa a prosódia, calculadas
a partir de um corpus de aulas universitárias em Português Europeu, contendo cerca de 32 horas de
fala e de cerca de 7,7% de disfluências.
O conjunto de características utilizadas provou ser discriminatório na identificação das regiões contidas
na produção de disfluências. Os melhores resultados dizem respeito à deteção do interregnum,
seguida da deteção do ponto de interrupção. Foram testados vários métodos de aprendizagem automática,
sendo as Árvores de Decisão e Regressão as que geralmente obtiveram os melhores resultados.
O conjunto de características mais informativas para a identificação e distinção de regiões disfluentes
abrange rácios de duração de palavras, nível de confiança da palavra atual, rácios envolvendo
silêncios e declives de pitch e de energia. Características tais como o número de fones e sílabas por
palavra provaram ser mais úteis para a identificação do interregnum, enquanto pitch e energia foram os
mais adequados para identificar o ponto de interrupção.
Foram também realizadas experiências focando a deteção de pausas preenchidas. Por enquanto,
para estas experiências foi utilizado apenas material proveniente de alinhamento forçado, já que o sistema
de reconhecimento automático não está bem adaptado a este domínio.
Este estudo representa um novo passo no sentido da deteção automática de pausas preenchidas
para Português Europeu, utilizando recursos prosódicos. Em trabalho futuro pretende-se estender esse
estudo para transcrições automáticas e também abordar outros domínios, explorando conjuntos mais
extensos de características linguísticas
On the Appearance of the Comedy LP, 1957–1973
Many observers of contemporary comedy in the United States during the 1960s referred to musical aspects of extra-musical performances. Comedy LP records furnish important artifacts for the study of the musical appearances these observers produced for themselves. Where contemporaries described appearances characterized by printable words and polemics as “satirical,” the musical appearances discussed in this dissertation can instead be described as “comic”: instead of mocking persons or ideas, they show people and things becoming involved with one another in absurdly triumphant ways. These two different sorts of appearances correspond to two different uses for comedy in a class society, one consolidating a hegemonic middle-class “consensus” against ridiculous adversaries, the other exploring surprising potentials in even the most ridiculous circumstances. A history of antagonistic ways of listening to sixties comedy can be read as a history of the making of class relations in an advanced capitalist society.
This dissertation discusses four case studies selected with two complementary aims: to produce an appearance of the comedy LP as a densely varied form and to produce knowledge of the political stakes involved in historical conflicts over formal appearances. In each study, a musical appearance becomes involved in the making of class. The jazz critic Nat Hentoff insisted on musical appearances of the iconic sixties comedian Lenny Bruce over and against what he derided as “liberal” readings for printable messages. His chief artifacts were comedy LP records. Elaine May and Mike Nichols—television stars, dinner club sensations, and luminaries of the most popularly influential improvisatory theater in the United States—used a tangled musical texture associated with affluent social circles. By invoking descriptions of the self as she might have found them in her widely reported readings of Freud, May seems to undermine the ethical significance of the tangled texture as previously determined by Katharine Hepburn’s films. The “blue record” or “party record” produced by and for black Americans in the 1970s was advertised in middle-class periodicals as a genre characterized by “dirty words.” But Tramp Time Volume 1 (La Val LVP 901, 1967), a purportedly early example of the party record featuring an itinerant Midwestern performer named Jimmy “Mr. Motion” Lynch, instead seems characterized most importantly by features of blues music. The Firesign Theatre, a Californian comedy troupe popular with the “dormitory debauchee set,” performed a peculiar involvement in history using a quasi-musical style based upon the characteristics of radio as a broadcast medium. This radiophonic style places observers “inside” history after the perceived closures of 1968.
Art-critical, archival, and philological methods shape this dissertation’s argument. Formalistic descriptions based upon vocabularies critically adapted from modern and contemporary writings produce “abstract” appearances. Artifacts collected through archival research ground these abstract appearances as “historically possible appearances.” As a formalism, this historical method uses its thickening self-referential vocabulary to invent its own critical universe. As a historical method, this formalism produces knowledge of appearances which, because they are grounded in activities, leave no self-contained artifacts
Examining the Linguistic Ideology Throaty Sounds Are Bad for Performers : The History of Negative Attitudes Towards Glottal Stops and Laryngealization in English
This thesis analyzes explicit metadiscourse (Johnstone et al 2006) on throaty sounds, primarily focused on glottal segments and non-modal constricted voice quality in English. Authors contributing to this metadiscourse are argued to be an offshoot of the speech chain network which valorized and circulated the English accent known as RP or Received Pronunciation, studied by Agha (2003). The evaluated texts center on English-speaking elocution, singing training, voice, speech, and voice care. The analysis shows glottal and guttural articulations are framed negatively and often discouraged by appeals to both health and aesthetics. Many authors in this performance speech chain network assert a linguistic ideology in the form of a belief mediating between language use and social structure: throaty sounds are bad for performers. However, as glottal stops and other laryngeal sounds are basic and naturally occurring consonants in many of the world’s languages, there is counter evidence to the view of them as problematic, injurious, or aberrant. Instead, it is theorized here that the negative outlook on throaty sounds is more deeply tied to historical and current-day social evaluation of stigmatized speakers of English who use salient throaty sounds, notably via associations with class, gender, and racialization. Negative material effects stem from this linguistic ideology. This research raises questions about the cultural framing of vocal health and the iconicity of voice quality
Analysis of nonmodal glottal event patterns with application to automatic speaker recognition
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2008.Includes bibliographical references (p. 211-215).Regions of phonation exhibiting nonmodal characteristics are likely to contain information about speaker identity, language, dialect, and vocal-fold health. As a basis for testing such dependencies, we develop a representation of patterns in the relative timing and height of nonmodal glottal pulses. To extract the timing and height of candidate pulses, we investigate a variety of inverse-filtering schemes including maximum-entropy deconvolution that minimizes predictability of a signal and minimum-entropy deconvolution that maximizes pulse-likeness. Hybrid formulations of these methods are also considered. we then derive a theoretical framework for understanding frequency- and time-domain properties of a pulse sequence, a process that sheds light on the transformation of nonmodal pulse trains into useful parameters. In the frequency domain, we introduce the first comprehensive mathematical derivation of the effect of deterministic and stochastic source perturbation on the short-time spectrum. We also propose a pitch representation of nonmodality that provides an alternative viewpoint on the frequency content that does not rely on Fourier bases. In developing time-domain properties, we use projected low-dimensional histograms of feature vectors derived from pulse timing and height parameters. For these features, we have found clusters of distinct pulse patterns, reflecting a wide variety of glottal-pulse phenomena including near-modal phonation, shimmer and jitter, diplophonia and triplophonia, and aperiodicity. Using temporal relationships between successive feature vectors, an algorithm by which to separate these different classes of glottal-pulse characteristics has also been developed.(cont.) We have used our glottal-pulse-pattern representation to automatically test for one signal dependency: speaker dependence of glottal-pulse sequences. This choice is motivated by differences observed between talkers in our separated feature space. Using an automatic speaker verification experiment, we investigate tradeoffs in speaker dependency for short-time pulse patterns, reflecting local irregularity, as well as long-time patterns related to higher-level cyclic variations. Results, using speakers with a broad array of modal and nonmodal behaviors, indicate a high accuracy in speaker recognition performance, complementary to the use of conventional mel-cepstral features. These results suggest that there is rich structure to the source excitation that provides information about a particular speaker's identity.by Nicolas Malyska.Ph.D