Search CORE

1,717 research outputs found

Universality and diversity in the vocalization of emotions

Author: Knoll M.
Panksepp J.
Zinken Joerg
Publication venue: Plural
Publication date: 01/01/2008
Field of study

Portsmouth University Research Portal (Pure)

Universal and language-specific processing : the case of prosody

Author: Ip Martin Ho Kwan
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2019
Field of study

A key question in the science of language is how speech processing can be influenced by both language-universal and language-specific mechanisms (Cutler, Klein, & Levinson, 2005). My graduate research aimed to address this question by adopting a crosslanguage approach to compare languages with different phonological systems. Of all components of linguistic structure, prosody is often considered to be one of the most language-specific dimensions of speech. This can have significant implications for our understanding of language use, because much of speech processing is specifically tailored to the structure and requirements of the native language. However, it is still unclear whether prosody may also play a universal role across languages, and very little comparative attempts have been made to explore this possibility. In this thesis, I examined both the production and perception of prosodic cues to prominence and phrasing in native speakers of English and Mandarin Chinese. In focus production, our research revealed that English and Mandarin speakers were alike in how they used prosody to encode prominence, but there were also systematic language-specific differences in the exact degree to which they enhanced the different prosodic cues (Chapter 2). This, however, was not the case in focus perception, where English and Mandarin listeners were alike in the degree to which they used prosody to predict upcoming prominence, even though the precise cues in the preceding prosody could differ (Chapter 3). Further experiments examining prosodic focus prediction in the speech of different talkers have demonstrated functional cue equivalence in prosodic focus detection (Chapter 4). Likewise, our experiments have also revealed both crosslanguage similarities and differences in the production and perception of juncture cues (Chapter 5). Overall, prosodic processing is the result of a complex but subtle interplay of universal and language-specific structure

Western Sydney ResearchDirect

Analyzing Prosody with Legendre Polynomial Coefficients

Author: Rakov Rachel
Publication venue: CUNY Academic Works
Publication date: 01/05/2019
Field of study

This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. We additionally perform machine learning classification for both tasks, (achieving an accuracy of 72.3% for nativeness classification, and achieving 81.57% for sarcasm detection). We recommend that linguists looking to analyze prosodic contours make use of Legendre polynomial coefficients modeling; the accuracy and quality of the resulting prosodic contour representations makes them highly interpretable for linguistic analysis

City University of New York

Automatic Pronunciation Assessment -- A Review

Author: Ali Ahmed
Chowdhury Shammur Absar
Kheir Yassine El
Publication venue
Publication date: 21/10/2023
Field of study

Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding

arXiv.org e-Print Archive

Feature extraction based on bio-inspired model for robust emotion recognition

Author: A Batliner
A Kapoor
AI Iliev
B Schuller
B Yang
C Clavel
C Martínez
C Martínez
D Giakoumis
D Morrison
Diego H. Milone
EM Albornoz
Enrique M. Albornoz
G Chanel
Hugo L. Rufiner
I Luengo Gil
J Adell Mercado
J Kim
J Kim
JR Deller Jr
K Schindler
KP Truong
M Ayadi El
M Wöllmer
N Cummins
N Mesgarani
S Koolagudi
S Shojaeilangari
S Yildirim
SA Shamma
T Chi
X Yang
Y Wang
Z Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2016
Field of study

Emotional state identification is an important issue to achieve more natural speech interactive systems. Ideally, these systems should also be able to work in real environments in which generally exist some kind of noise. Several bio-inspired representations have been applied to artificial systems for speech processing under noise conditions. In this work, an auditory signal representation is used to obtain a novel bio-inspired set of features for emotional speech signals. These characteristics, together with other spectral and prosodic features, are used for emotion recognition under noise conditions. Neural models were trained as classifiers and results were compared to the well-known mel-frequency cepstral coefficients. Results show that using the proposed representations, it is possible to significantly improve the robustness of an emotion recognition system. The results were also validated in a speaker independent scheme and with two emotional speech corpora.Fil: Albornoz, Enrique Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

Crossref

CONICET Digital

Recommended from our members

Production of English Prominence by Native Mandarin Chinese Speakers

Author: Hirschberg Julia Bell
Rosenberg Andrew
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2010
Field of study

Native-like production of intonational prominence is important for spoken language competency. Non-native speakers may have trouble producing prosodic variation in a second language (L2) and thus, problems in being understood. By identifying common sources of production error, we will be able to aid in the instruction of L2 speakers. In this paper we present results of a production study designed to test the ability of Mandarin L1 speakers to produce prominence in English. Our results show that there are some consistent differences between the L1 and L2 speakers in the use of pitch to indicate prominence, as well as in the accenting of phrase-initial tokens. We also find that we can automatically detect prominence on Mandarin L1 English with 87.23% and an f-measure of 0.866 if we train a classifier with annotated Mandarin L1 English data. Models trained on native English speech can detect prominence in Mandarin L1 English with an accuracy of 74.77% and f-measure of 0.824

Columbia University Academic Commons

Recommended from our members

Story Segmentation of Broadcast News in English, Mandarin and Arabic

Author: Hirschberg Julia Bell
Rosenberg Andrew
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

In this paper, we present results from a Broadcast News story segmentation system developed for the SRI NIGHTINGALE system operating on English, Arabic and Mandarin news shows to provide input to subsequent question-answering processes. Using a rule-induction algorithm with automatically extracted acoustic and lexical features, we report success rates that are competitive with state-of-the-art systems on each input language. We further demonstrate that features useful for English and Mandarin are not discriminative for Arabic

Columbia University Academic Commons

Intergroup Variability in Personality Recognition

Author: Sengupta Arundhati
Publication venue: CUNY Academic Works
Publication date: 01/05/2018
Field of study

Automatic Identification of personality in conversational speech has many applications in natural language processing such as leader identification in a meeting, adaptive dialogue systems, and dating websites. However, the widespread acceptance of automatic personality recognition through lexical and vocal characteristics is limited by the variability of error rate in a general purpose model among speakers from different demographic groups. While other work reports accuracy, we explored error rates of automatic personality recognition task using classification models for different genders and native language groups (L1). We also present a statistical experiment showing the influence of gender and L1 on the relation between acoustic-prosodic features and NEO- FFI self-reported personality traits. Our results show the impact of demographic differences on error rate varies considerably while predicting “Big Five” personality traits from speaker’s utterances. This impact can also be observed through differences in the statistical relationship of voice characteristics with each personality inventory. These findings can be used to calibrate existing personality recognition models or to develop new models that are robust to intergroup variability

City University of New York