11 research outputs found

    Prosodic Event Recognition using Convolutional Neural Networks with Context Information

    Full text link
    This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur

    Classification of ASR Word Hypotheses using prosodic information and resampling of training data

    Get PDF
    In this work, we propose a novel re-sampling method based on word lattice information and we use prosodic cues with support vector machines for classification. The idea is to consider word recognition as a two-class classification problem, which considers the word hypotheses in the lattice of a standard recognizer either as True or False employing prosodic information. The technique developed in this paper was applied to set of words extracted from a continuous speech database. Our experimental results show that the method allows obtaining average word hypotheses recognition rate of 82%.Fil: Albornoz, Enrique Marcelo. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Departamento de Informática. Laboratorio de Investigaciones en Señales e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Departamento de Informática. Laboratorio de Investigaciones en Señales e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Departamento de Informática. Laboratorio de Investigaciones en Señales e Inteligencia Computacional; ArgentinaFil: López-Cózar, R.. Escuela Técnica Superior en Ingeniería Informática y de Telecomunicación. Universidad de Granada; España

    Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer

    Get PDF
    Since the prosody of a spoken utterance carries information about its discourse function, salience, and speaker attitude, prosody mod- els and prosody generation modules have played a crucial part in text-to- speech (TTS) synthesis systems from the beginning, especially those set not only on sounding natural, but also on showing emotion or particular speaker intention. Prosody transfer within speech-to-speech translation is a recent research area with increasing importance, with one of its most important research topics being the detection and treatment of salient events, i.e. instances of prominence or focus which do not result from syn- tactic constraints, but are rather products of semantic or pragmatic level eects. This paper presents the design and the guidelines for the creation of a multilingual speech corpus containing prosodically rich sentences, ultimately aimed at training statistical prosody models for multilingual prosody transfer in the context of expressive speech synthesis

    An empirical approach for comparing syntax and pros ody driven prominence marking

    Get PDF

    The SP2 SCOPES Project on Speech Prosody

    Get PDF
    This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding

    Predicting pragmatic functions of Chinese echo questions using prosody: evidence from acoustic analysis and data modeling

    Get PDF
    Echo questions serve two pragmatic functions (recapitulatory and explicatory) and are subdivided into two types (yes-no echo question and wh-echo question) in verbal communication. Yet to date, most relevant studies have been conducted in European languages like English and Spanish. It remains unknown whether the different functions of echo questions can be conveyed via prosody in spoken Chinese. Additionally, no comparison was made on the diversified algorithmic models in predicting functions by the prosodity of Chinese echo questions, a novel linguistic cognition in nature. This motivated us to use different acoustic cues to predict different pragmatic functions of Chinese echo questions by virtue of acoustic experiment and data modeling. The results showed that for yes-no echo question, explicatory function exhibited higher pitch and intensity patterns than recapitulatory function whereas for wh-echo question, recapitulatory function demonstrated higher pitch and intensity patterns than explicatory function. With regard to data modeling, the algorithm Support Vector Machine (SVM) relative to Random Forest (RF) and Logistic Regression (LR) performed better when predicting different functions using prosodic cues in both yes-no and wh-echo questions. This study from a digitized perspective adds evidence to the cognition of echo questions’ functions on a prosodic basis

    Est-ce que la reconnaissance de la prosodie émotionnelle dans la langue française est modulée par les accents régionaux français et québécois ?

    Full text link
    Mémoire de maîtrise présenté en vue de l'obtention de la maîtrise en psychologie (M. Sc)Contexte : La prosodie de la parole, c'est-à-dire les variations du ton de la voix lorsque l'on parle, joue un rôle clé dans les interactions sociales en apportant entre autres des informations importantes liées à l'identité, l'état émotionnel ou encore l'origine géographique. La prosodie est modifiée par les accents d’une personne, en particulier si elle parle une langue étrangère. Ces accents ont un impact important sur la façon dont la parole est reconnue, avec des conséquences significatives sur la façon dont le locuteur est perçu socialement, comme une baisse d’empathie ou encore une moins grande confiance. Cependant, il est moins clair si cet impact, généralement négatif, persiste dans le contexte des accents régionaux qui constituent des variations plus subtiles du signal vocal. Objectif et hypothèse : L'objectif de ce présent mémoire est de comprendre comment des individus francophones de différentes régions (France, Québec) expriment et reconnaissent des phrases émotionnelles prononcées par des personnes originaires de la même région ou non. Plusieurs études suggèrent un avantage de groupe, qui renvoie à l’idée que même si les émotions pourraient être reconnues de manière universelle, nous reconnaissons mieux les productions émotionnelles de personnes de notre propre groupe culturel que de personnes extérieures à ce groupe. Est-ce que cet avantage persiste dans le cas des accents régionaux, pour lesquels deux populations partagent la même langue ? Cette question reste très peu étudiée et ne l’a jamais été avec la langue française. Nous souhaitons 1) créer et valider une banque de phrases émotionnelles prononcées en français avec des accents de France et du Québec ; 2) caractériser les profils acoustiques de ces productions émotionnelles. Sur la base de données de la littérature (e.g., Mauchand et Pell, 2020), nous nous attendons à ce que les québécois (Qc) montrent une prosodie émotionnelle plus expressive que les Français (Fr). Méthode : Nous avons créé de courtes phrases émotionnelles dans 5 émotions (Joie, Tristesse, Colère, Fierté, Honte), prononcées par des acteurs quebecoie.s.es et français. Cette de banque de stimuli a été validé avec une étude en ligne par des françaises et québécoises. 4 Avec un modèle général mixte, nous avons analysé les paramètres vocaux: moyenne et l’écart type de la fréquence fondamentale, l’écart-type et la moyenne de l'intensité, Shimmer moyen, Jitter, HNR, l’indice Hammarberg, pente spectrale et durée des phrases. Résultats : Les paramètres de la fréquence fondamentale moyenne (F0M), d’intensité, de durée, de pente spectrale et d’indice Hammarberg sont significativement différents selon les émotions et entre les origines, Nous avons aussi noté une interaction entre les sexes des locuteurs et leurs origines. De manière globale, sur les cinq émotions considérées, les Fr parlent avec une F0M plus élevée, sauf pour la tristesse. Les Qc parlent eux, pour toutes les émotions, avec une plus grande intensité et une plus longue durée. Au final, nous pouvons considérer que les Qc expriment de manière plus prononcée les émotions que les Fr, sauf au niveau de la colère. En ce qui concerne les différences liées au sexe des locuteurs, nous avons remarqué que les hommes Qc ont une prosodie émotionnelle plus forte que les hommes Fr. Des différences entre les femmes Qc et Fr ont été seulement observées dans les émotions de honte et de fierté, des émotions plus sociales. Conclusion : Nous avons pu caractériser l’expression vocale émotionnelle des Fr et Qc qui, malgré leur langue commune, s’expriment de manière très distincte pour transmettre leurs émotions. Ces résultats ouvrent des perspectives intéressantes sur les interactions interculturelles d’une même langue mais de régions différentes et confirment la prosodie de langage, en particulier émotionnelle, comme un véritable marqueur identitaire.Context: The prosody of speech, i.e. the variations in tone of voice when speaking, plays a key role in social interactions by providing important information linked to identity, emotional state and geographical origin. Foreign accents have a major impact on the way speech is recognized, with significant consequences for social evaluations, such as reduced empathy towards the speaker. However, it is less clear whether this impact persists in the context of regional accents, which are more subtle variations of the speech signal. Objective and hypothesis: The aim of this dissertation is to understand how French-speaking individuals from different regions (France, Quebec) express and recognize emotional phrases spoken by people from the same or different regions. Given the hypothesis of group advantage, we believe that expression and perception differ according to culture. Based on Mauchand and Pell’s (2020) study, we expect Quebecers (Qc) to show a more expressive emotional prosody than French people (Fr). Method: We created short emotional sentences in 5 emotions (Joy, Sadness, Anger, Pride, Shame), spoken by Quebecois and French actors. This bank of stimuli was validated with an online study by French and Quebecers. Using a general mixed model, we analyzed the vocal parameters: mean and standard deviation of the fundamental frequency, standard deviation and mean of the intensity, mean Shimmer, Jitter, HNR, Hammarberg index, Spectral Slope, Duration. Results: The parameters of fundamental frequency (F0M), intensity, duration, spectral slope and Hammarberg index differed significantly between the emotions, origins and sexes. For example, of the five emotions, Frs spoke with a higher F0M except for sadness, but Qc spoke with greater intensity and longer duration. The Qc expressed the emotions in a more pronounced way than the Fr, except for anger. Also, many significant differences show that Qc men have a stronger emotional prosody than Fr men. Finally, only differences between Qc and Fr women were observed in the emotions of shame and pride, emotions that are not in the 6 primary emotions, but which 6 would be more cultural emotions. Conclusion: We were able to characterize the emotional vocal expression of Fr and Qc who, despite their common language, express themselves in very distinct ways to convey their emotions. These results open up interesting perspectives on intercultural interactions in the same language but in different regions and show that speech prosody, particularly emotional prosody, could be considered as a marker of regional identity

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations
    corecore