11 research outputs found
Prosodic Event Recognition using Convolutional Neural Networks with Context Information
This paper demonstrates the potential of convolutional neural networks (CNN)
for detecting and classifying prosodic events on words, specifically pitch
accents and phrase boundary tones, from frame-based acoustic features. Typical
approaches use not only feature representations of the word in question but
also its surrounding context. We show that adding position features indicating
the current word benefits the CNN. In addition, this paper discusses the
generalization from a speaker-dependent modelling approach to a
speaker-independent setup. The proposed method is simple and efficient and
yields strong results not only in speaker-dependent but also
speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur
Classification of ASR Word Hypotheses using prosodic information and resampling of training data
In this work, we propose a novel re-sampling method based on word lattice information and we use prosodic cues with support vector machines for classification. The idea is to consider word recognition as a two-class classification problem, which considers the word hypotheses in the lattice of a standard recognizer either as True or False employing prosodic information. The technique developed in this paper was applied to set of words extracted from a continuous speech database. Our experimental results show that the method allows obtaining average word hypotheses recognition rate of 82%.Fil: Albornoz, Enrique Marcelo. Universidad Nacional del Litoral. Facultad de IngenierĂa y Ciencias HĂdricas. Departamento de Informática. Laboratorio de Investigaciones en Señales e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Universidad Nacional del Litoral. Facultad de IngenierĂa y Ciencias HĂdricas. Departamento de Informática. Laboratorio de Investigaciones en Señales e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Universidad Nacional del Litoral. Facultad de IngenierĂa y Ciencias HĂdricas. Departamento de Informática. Laboratorio de Investigaciones en Señales e Inteligencia Computacional; ArgentinaFil: LĂłpez-CĂłzar, R.. Escuela TĂ©cnica Superior en IngenierĂa Informática y de TelecomunicaciĂłn. Universidad de Granada; España
Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer
Since the prosody of a spoken utterance carries information about its discourse function, salience, and speaker attitude, prosody mod- els and prosody generation modules have played a crucial part in text-to- speech (TTS) synthesis systems from the beginning, especially those set not only on sounding natural, but also on showing emotion or particular speaker intention. Prosody transfer within speech-to-speech translation is a recent research area with increasing importance, with one of its most important research topics being the detection and treatment of salient events, i.e. instances of prominence or focus which do not result from syn- tactic constraints, but are rather products of semantic or pragmatic level eects. This paper presents the design and the guidelines for the creation of a multilingual speech corpus containing prosodically rich sentences, ultimately aimed at training statistical prosody models for multilingual prosody transfer in the context of expressive speech synthesis
The SP2 SCOPES Project on Speech Prosody
This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding
Predicting pragmatic functions of Chinese echo questions using prosody: evidence from acoustic analysis and data modeling
Echo questions serve two pragmatic functions (recapitulatory and explicatory) and are subdivided into two types (yes-no echo question and wh-echo question) in verbal communication. Yet to date, most relevant studies have been conducted in European languages like English and Spanish. It remains unknown whether the different functions of echo questions can be conveyed via prosody in spoken Chinese. Additionally, no comparison was made on the diversified algorithmic models in predicting functions by the prosodity of Chinese echo questions, a novel linguistic cognition in nature. This motivated us to use different acoustic cues to predict different pragmatic functions of Chinese echo questions by virtue of acoustic experiment and data modeling. The results showed that for yes-no echo question, explicatory function exhibited higher pitch and intensity patterns than recapitulatory function whereas for wh-echo question, recapitulatory function demonstrated higher pitch and intensity patterns than explicatory function. With regard to data modeling, the algorithm Support Vector Machine (SVM) relative to Random Forest (RF) and Logistic Regression (LR) performed better when predicting different functions using prosodic cues in both yes-no and wh-echo questions. This study from a digitized perspective adds evidence to the cognition of echo questions’ functions on a prosodic basis
Est-ce que la reconnaissance de la prosodie émotionnelle dans la langue française est modulée par les accents régionaux français et québécois ?
Mémoire de maîtrise présenté en vue de l'obtention de la maîtrise en psychologie (M. Sc)Contexte : La prosodie de la parole, c'est-à -dire les variations du ton de la voix lorsque l'on
parle, joue un rôle clé dans les interactions sociales en apportant entre autres des informations
importantes liées à l'identité, l'état émotionnel ou encore l'origine géographique. La prosodie
est modifiée par les accents d’une personne, en particulier si elle parle une langue étrangère.
Ces accents ont un impact important sur la façon dont la parole est reconnue, avec des
conséquences significatives sur la façon dont le locuteur est perçu socialement, comme une
baisse d’empathie ou encore une moins grande confiance. Cependant, il est moins clair si cet
impact, généralement négatif, persiste dans le contexte des accents régionaux qui constituent
des variations plus subtiles du signal vocal. Objectif et hypothèse : L'objectif de ce présent
mémoire est de comprendre comment des individus francophones de différentes régions
(France, Québec) expriment et reconnaissent des phrases émotionnelles prononcées par des
personnes originaires de la même région ou non. Plusieurs études suggèrent un avantage de
groupe, qui renvoie à l’idée que même si les émotions pourraient être reconnues de manière
universelle, nous reconnaissons mieux les productions Ă©motionnelles de personnes de notre
propre groupe culturel que de personnes extérieures à ce groupe. Est-ce que cet avantage
persiste dans le cas des accents régionaux, pour lesquels deux populations partagent la même
langue ? Cette question reste très peu étudiée et ne l’a jamais été avec la langue française. Nous
souhaitons 1) créer et valider une banque de phrases émotionnelles prononcées en français avec
des accents de France et du Québec ; 2) caractériser les profils acoustiques de ces productions
émotionnelles. Sur la base de données de la littérature (e.g., Mauchand et Pell, 2020), nous nous
attendons à ce que les québécois (Qc) montrent une prosodie émotionnelle plus expressive que
les Français (Fr). Méthode : Nous avons créé de courtes phrases émotionnelles dans 5 émotions
(Joie, Tristesse, Colère, Fierté, Honte), prononcées par des acteurs quebecoie.s.es et français.
Cette de banque de stimuli a été validé avec une étude en ligne par des françaises et québécoises.
4
Avec un modèle général mixte, nous avons analysé les paramètres vocaux: moyenne et l’écart
type de la fréquence fondamentale, l’écart-type et la moyenne de l'intensité, Shimmer moyen,
Jitter, HNR, l’indice Hammarberg, pente spectrale et durée des phrases. Résultats : Les
paramètres de la fréquence fondamentale moyenne (F0M), d’intensité, de durée, de pente
spectrale et d’indice Hammarberg sont significativement différents selon les émotions et entre
les origines, Nous avons aussi noté une interaction entre les sexes des locuteurs et leurs origines.
De manière globale, sur les cinq émotions considérées, les Fr parlent avec une F0M plus élevée,
sauf pour la tristesse. Les Qc parlent eux, pour toutes les Ă©motions, avec une plus grande
intensité et une plus longue durée. Au final, nous pouvons considérer que les Qc expriment de
manière plus prononcée les émotions que les Fr, sauf au niveau de la colère. En ce qui concerne
les différences liées au sexe des locuteurs, nous avons remarqué que les hommes Qc ont une
prosodie émotionnelle plus forte que les hommes Fr. Des différences entre les femmes Qc et Fr
ont été seulement observées dans les émotions de honte et de fierté, des émotions plus sociales.
Conclusion : Nous avons pu caractériser l’expression vocale émotionnelle des Fr et Qc qui,
malgré leur langue commune, s’expriment de manière très distincte pour transmettre leurs
émotions. Ces résultats ouvrent des perspectives intéressantes sur les interactions
interculturelles d’une même langue mais de régions différentes et confirment la prosodie de
langage, en particulier émotionnelle, comme un véritable marqueur identitaire.Context: The prosody of speech, i.e. the variations in tone of voice when speaking,
plays a key role in social interactions by providing important information linked to identity,
emotional state and geographical origin. Foreign accents have a major impact on the way
speech is recognized, with significant consequences for social evaluations, such as reduced
empathy towards the speaker. However, it is less clear whether this impact persists in the
context of regional accents, which are more subtle variations of the speech signal. Objective
and hypothesis: The aim of this dissertation is to understand how French-speaking
individuals from different regions (France, Quebec) express and recognize emotional phrases
spoken by people from the same or different regions. Given the hypothesis of group
advantage, we believe that expression and perception differ according to culture. Based on
Mauchand and Pell’s (2020) study, we expect Quebecers (Qc) to show a more expressive
emotional prosody than French people (Fr). Method: We created short emotional sentences
in 5 emotions (Joy, Sadness, Anger, Pride, Shame), spoken by Quebecois and French actors.
This bank of stimuli was validated with an online study by French and Quebecers. Using a
general mixed model, we analyzed the vocal parameters: mean and standard deviation of the
fundamental frequency, standard deviation and mean of the intensity, mean Shimmer, Jitter,
HNR, Hammarberg index, Spectral Slope, Duration. Results: The parameters of fundamental
frequency (F0M), intensity, duration, spectral slope and Hammarberg index differed
significantly between the emotions, origins and sexes. For example, of the five emotions, Frs
spoke with a higher F0M except for sadness, but Qc spoke with greater intensity and longer
duration. The Qc expressed the emotions in a more pronounced way than the Fr, except for
anger. Also, many significant differences show that Qc men have a stronger emotional
prosody than Fr men. Finally, only differences between Qc and Fr women were observed in
the emotions of shame and pride, emotions that are not in the 6 primary emotions, but which
6
would be more cultural emotions. Conclusion: We were able to characterize the emotional
vocal expression of Fr and Qc who, despite their common language, express themselves in
very distinct ways to convey their emotions. These results open up interesting perspectives on
intercultural interactions in the same language but in different regions and show that speech
prosody, particularly emotional prosody, could be considered as a marker of regional identity
Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications
The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations