Search CORE

13 research outputs found

Influence of expressive speech on ASR performances: application to elderly assistance in smart home

Author: AC Clarcke
B Schuller
JA Russell
K Peetoom
KR Scherer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/05/2016
Field of study

International audienceSmart homes are discussed as a win-win solution for maintaining the Elderly at home as a better alternative to care homes for dependent elderly people. Such Smart homes are characterized by rich domestic commands devoted to elderly safety and comfort. The vocal command has been identified as an efficient , well accepted, interaction way, it can be directly addressed to the "habitat", or through a robotic interface. In daily use, the challenges of vocal commands recognition are the noisy environment but moreover the reformulation and the expressive change of the strictly authorized commands. This paper focuses (1) to show, on the base of elicited corpus, that expressive speech, in particular distress speech, strongly affects generic state of the art ASR systems (20 to 30%) (2) how interesting improvement thanks to ASR adaptation can regulate (15%) this degradation. We conclude on the necessary adaptation of ASR system to expressive speech when they are designed for person's assistance

Crossref

Hal - Université Grenoble Alpes

Détection de marqueurs affectifs et attentionnels de personnes âgées en interaction avec un robot

Author: Yang Fan
Publication venue: HAL CCSD
Publication date: 23/10/2015
Field of study

This thesis work focuses on audio-visual detection of emotional (laugh and smile) and attentional markers for elderly people in social interaction with a robot. To effectively understand and model the pattern of behavior of very old people in the presence of a robot, relevant data are needed. I participated in the collection of a corpus of elderly people in particular for recording visual data. The system used to control the robot is a Wizard of Oz, several daily conversation scenarios were used to encourage people to interact with the robot. These scenarios were developed as part of the ROMEO2 project with the Approche association. We described at first the corpus collected which contains 27 subjects of 85 years' old on average for a total of 9 hours, annotations and we discussed the results obtained from the analysis of annotations and two questionnaires.My research then focuses on the attention detection and the laughter and smile detection. The motivations for the attention detection are to detect when the subject is not addressing to the robot and adjust the robot's behavior to the situation. After considering the difficulties related to the elderly people and the analytical results obtained by the study of the corpus annotations, we focus on the rotation of the head at the visual index and energy and quality vote for the detection of the speech recipient. The laughter and smile detection can be used to study on the profile of the speaker and her emotions. My interests focus on laughter and smile detection in the visual modality and the fusion of audio-visual information to improve the performance of the automatic system. Spontaneous expressions are different from posed or acted expression in both appearance and timing. Designing a system that works on realistic data of the elderly is even more difficult because of several difficulties to consider such as the lack data for training the statistical model, the influence of the facial texture and the smiling pattern for visual detection, the influence of voice quality for auditory detection, the variety of reaction time, the level of listening comprehension, loss of sight for elderly people, etc. The systems of head-turning detection, attention detection and laughter and smile detection are evaluated on ROMEO2 corpus and partially evaluated (visual detections) on standard corpus Pointing04 and GENKI-4K to compare with the scores of the methods on the state of the art. We also found a negative correlation between laughter and smile detection performance and the number of laughter and smile events for the visual detection system and the audio-visual system. This phenomenon can be explained by the fact that elderly people who are more interested in experimentation laugh more often and therefore perform more various poses. The variety of poses and the lack of corresponding data bring difficulties for the laughter and smile recognition for our statistical systems. The experiments show that the head-turning can be effectively used to detect the loss of the subject's attention in the interaction with the robot. For the attention detection, the potential of a cascade method using both methods in a complementary manner is shown. This method gives better results than the audio system. For the laughter and smile detection, under the same leave-one-out protocol, the fusion of the two monomodal systems significantly improves the performance of the system at the segmental evaluation.Ces travaux de thèse portent sur la détection audio-visuelle de marqueurs affectifs (rire et sourire) et attentionnels de personnes âgées en interaction sociale avec un robot. Pour comprendre efficacement et modéliser le comportement des personnes très âgées en présence d'un robot, des données pertinentes sont nécessaires. J'ai participé à la collection d'un corpus de personnes âgées notamment pour l'enregistrement des données visuelles. Le système utilisé pour contrôler le robot est un magicien d'Oz, plusieurs scénarios de conversation au quotidien ont été utilisés pour encourager les gens à coopérer avec le robot. Ces scénarios ont été élaborés dans le cadre du projet ROMEO2 avec l'association Approche.Nous avons décrit tout d'abord le corpus recueilli qui contient 27 sujets de 85 ans en moyenne pour une durée totale de 9 heures, les annotations et nous avons discuté des résultats obtenus à partir de l'analyse des annotations et de deux questionnaires. Ma recherche se focalise ensuite sur la détection de l'attention et la détection de rire et de sourire. Les motivations pour la détection de l'attention consistent à détecter quand le sujet ne s'adresse pas au robot et à adapter le comportement du robot à la situation. Après avoir considéré les difficultés liées aux personnes âgées et les résultats d'analyse obtenus par l'étude des annotations du corpus, nous nous intéressons à la rotation de la tête au niveau de l'indice visuel et à l'énergie et la qualité de voix pour la détection du destinataire de la parole. La détection de rire et sourire peut être utilisée pour l'étude sur le profil du locuteur et de ses émotions. Mes intérêts se concentrent sur la détection de rire et sourire dans la modalité visuelle et la fusion des informations audio-visuelles afin d'améliorer la performance du système automatique. Les expressions sont différentes des expressions actées ou posés à la fois en apparence et en temps de réaction. La conception d'un système qui marche sur les données réalistes des personnes âgées est encore plus difficile à cause de plusieurs difficultés à envisager telles que le manque de données pour l'entrainement du modèle statistique, l'influence de la texture faciale et de la façon de sourire pour la détection visuelle, l'influence de la qualité vocale pour la détection auditive, la variété du temps de réaction, le niveau de compréhension auditive, la perte de la vue des personnes âgées, etc. Les systèmes de détection de la rotation de la tête, de la détection de l'attention et de la détection de rire et sourire sont évalués sur le corpus ROMEO2 et partiellement évalués (détections visuelles) sur les corpus standard Pointing04 et GENKI-4K pour comparer avec les scores des méthodes de l'état de l'art. Nous avons également trouvé une corrélation négative entre la performance de détection de rire et sourire et le nombre d'évènement de rire et sourire pour le système visuel et le système audio-visuel. Ce phénomène peut être expliqué par le fait que les personnes âgées qui sont plus intéressées par l'expérimentation rient plus souvent et sont plus à l'aise donc avec des poses variées. La variété des poses et le manque de données correspondantes amènent des difficultés pour la reconnaissance de rire et de sourire pour les systèmes statistiques.Les expérimentations montrent que la rotation de la tête peut être efficacement utilisée pour détecter la perte de l'attention du sujet dans l'interaction avec le robot. Au niveau de la détection de l'attention, le potentiel d'une méthode en cascade qui utilise les modalités d'une manière complémentaire est montré. Cette méthode donne de meilleurs résultats que le système auditif seul. Pour la détection de rire et sourire, en suivant le même protocole « Leave-one-out », la fusion des deux systèmes monomodaux améliore aussi significativement la performance par rapport à un système monomodal au niveau de l'évaluation segmentale

Thèses en Ligne

Modélisation du profil émotionnel de l'utilisateur dans les interactions parlées Humain-Machine

Author: DELABORDE Agnès
DEVILLERS Laurence
Publication venue
Publication date: 01/01/2013
Field of study

Les travaux de recherche de la thèse portent sur l'étude et la formalisation des interactions émotionnelles Humain-Machine. Au delà d une détection d'informations paralinguistiques (émotions, disfluences,...) ponctuelles, il s'agit de fournir au système un profil interactionnel et émotionnel de l'utilisateur dynamique, enrichi pendant l interaction. Ce profil permet d adapter les stratégies de réponses de la machine au locuteur, et il peut également servir pour mieux gérer des relations à long terme. Le profil est fondé sur une représentation multi-niveau du traitement des indices émotionnels et interactionnels extraits à partir de l'audio via les outils de détection des émotions du LIMSI. Ainsi, des indices bas niveau (variations de la F0, d'énergie, etc.), fournissent des informations sur le type d'émotion exprimée, la force de l'émotion, le degré de loquacité, etc. Ces éléments à moyen niveau sont exploités dans le système afin de déterminer, au fil des interactions, le profil émotionnel et interactionnel de l'utilisateur. Ce profil est composé de six dimensions : optimisme, extraversion, stabilité émotionnelle, confiance en soi, affinité et domination (basé sur le modèle de personnalité OCEAN et les théories de l interpersonal circumplex). Le comportement social du système est adapté en fonction de ce profil, de l'état de la tâche en cours, et du comportement courant du robot. Les règles de création et de mise à jour du profil émotionnel et interactionnel, ainsi que de sélection automatique du comportement du robot, ont été implémentées en logique floue à l'aide du moteur de décision développé par un partenaire du projet ROMEO. L implémentation du système a été réalisée sur le robot NAO. Afin d étudier les différents éléments de la boucle d interaction émotionnelle entre l utilisateur et le système, nous avons participé à la conception de plusieurs systèmes : système en Magicien d Oz pré-scripté, système semi-automatisé, et système d interaction émotionnelle autonome. Ces systèmes ont permis de recueillir des données en contrôlant plusieurs paramètres d élicitation des émotions au sein d une interaction ; nous présentons les résultats de ces expérimentations, et des protocoles d évaluation de l Interaction Humain-Robot via l utilisation de systèmes à différents degrés d autonomie.Analysing and formalising the emotional aspect of the Human-Machine Interaction is the key to a successful relation. Beyond and isolated paralinguistic detection (emotion, disfluences ), our aim consists in providing the system with a dynamic emotional and interactional profile of the user, which can evolve throughout the interaction. This profile allows for an adaptation of the machine s response strategy, and can deal with long term relationships. A multi-level processing of the emotional and interactional cues extracted from speech (LIMSI emotion detection tools) leads to the constitution of the profile. Low level cues ( F0, energy, etc.), are then interpreted in terms of expressed emotion, strength, or talkativeness of the speaker. These mid-level cues are processed in the system so as to determine, over the interaction sessions, the emotional and interactional profile of the user. The profile is made up of six dimensions: optimism, extroversion, emotional stability, self-confidence, affinity and dominance (based on the OCEAN personality model and the interpersonal circumplex theories). The information derived from this profile could allow for a measurement of the engagement of the speaker. The social behaviour of the system is adapted according to the profile, and the current task state and robot behaviour. Fuzzy logic rules drive the constitution of the profile and the automatic selection of the robotic behaviour. These determinist rules are implemented on a decision engine designed by a partner in the project ROMEO. We implemented the system on the humanoid robot NAO. The overriding issue dealt with in this thesis is the viable interpretation of the paralinguistic cues extracted from speech into a relevant emotional representation of the user. We deem it noteworthy to point out that multimodal cues could reinforce the profile s robustness. So as to analyse the different parts of the emotional interaction loop between the user and the system, we collaborated in the design of several systems with different autonomy degrees: a pre-scripted Wizard-of-Oz system, a semi-automated system, and a fully autonomous system. Using these systems allowed us to collect emotional data in robotic interaction contexts, by controlling several emotion elicitation parameters. This thesis presents the results of these data collections, and offers an evaluation protocol for Human-Robot Interaction through systems with various degrees of autonomy.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

OpenGrey Repository

Analyse acoustique de la voix émotionnelle de locuteurs lors d'une interaction humain-robot

Author: BARRAS Claude
DEVILLERS Laurence
TAHON Marie
Publication venue
Publication date: 01/01/2012
Field of study

Mes travaux de thèse s'intéressent à la voix émotionnelle dans un contexte d'interaction humain-robot. Dans une interaction réaliste, nous définissons au moins quatre grands types de variabilités : l'environnement (salle, microphone); le locuteur, ses caractéristiques physiques (genre, âge, type de voix) et sa personnalité; ses états émotionnels; et enfin le type d'interaction (jeu, situation d'urgence ou de vie quotidienne). A partir de signaux audio collectés dans différentes conditions, nous avons cherché, grâce à des descripteurs acoustiques, à imbriquer la caractérisation d'un locuteur et de son état émotionnel en prenant en compte ces variabilités.Déterminer quels descripteurs sont essentiels et quels sont ceux à éviter est un défi complexe puisqu'il nécessite de travailler sur un grand nombre de variabilités et donc d'avoir à sa disposition des corpus riches et variés. Les principaux résultats portent à la fois sur la collecte et l'annotation de corpus émotionnels réalistes avec des locuteurs variés (enfants, adultes, personnes âgées), dans plusieurs environnements, et sur la robustesse de descripteurs acoustiques suivant ces quatre variabilités. Deux résultats intéressants découlent de cette analyse acoustique: la caractérisation sonore d'un corpus et l'établissement d'une liste "noire" de descripteurs très variables. Les émotions ne sont qu'une partie des indices paralinguistiques supportés par le signal audio, la personnalité et le stress dans la voix ont également été étudiés. Nous avons également mis en oeuvre un module de reconnaissance automatique des émotions et de caractérisation du locuteur qui a été testé au cours d'interactions humain-robot réalistes. Une réflexion éthique a été menée sur ces travaux.This thesis deals with emotional voices during a human-robot interaction. In a natural interaction, we define at least, four kinds of variabilities: environment (room, microphone); speaker, its physic characteristics (gender, age, voice type) and personality; emotional states; and finally the kind of interaction (game scenario, emergency, everyday life). From audio signals collected in different conditions, we tried to find out, with acoustic features, to overlap speaker and his emotional state characterisation taking into account these variabilities.To find which features are essential and which are to avoid is hard challenge because it needs to work with a high number of variabilities and then to have riche and diverse data to our disposal. The main results are about the collection and the annotation of natural emotional corpora that have been recorded with different kinds of speakers (children, adults, elderly people) in various environments, and about how reliable are acoustic features across the four variabilities. This analysis led to two interesting aspects: the audio characterisation of a corpus and the drawing of a black list of features which vary a lot. Emotions are ust a part of paralinguistic features that are supported by the audio channel, other paralinguistic features have been studied such as personality and stress in the voice. We have also built automatic emotion recognition and speaker characterisation module that we have tested during realistic interactions. An ethic discussion have been driven on our work.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

OpenGrey Repository

Communication kinesthésique des émotions dans un contexte d'interaction homme-machine

Author: Gaffary Yoren
Publication venue: HAL CCSD
Publication date: 18/06/2015
Field of study

The communication of emotions use several modalities of expression, as facial expressions or touch. The affective computing field aims to integrate an emotional component in human-computer interactions. Even though touch is an effective vector of emotions, is remains little explored. This thesis aims to investigate the kinesthetic features of the expression and perception of emotions in a human-computer interaction setting. Firstly, this thesis considers the kinesthetic expression and perception of semantically close and acted emotions. Secondly, this thesis proposes a facial-kinesthetic combination of expressions of several close emotions. This aims to investigate the influence of the kinesthetic modality on the multimodal perception of emotions. Finally, this thesis goes beyond acted emotions by focusing on the expression and perception of a spontaneous state of stress. Those different experiments used various devices for kinesthetic interaction, as Geomagic Touch devices and a pressure rendering device developed for this thesis. Results have multiple applications. Firstly, a better integration of the kinesthetic modality in virtual settings, from human-human remote communications to immersion in video games. This thesis also paves the way for an automatic recognition of affective states expressed by the kinesthetic modality.La communication des émotions s'effectue naturellement par le biais de différentes modalités, comme par exemple les expressions faciales et le toucher. L'informatique affective cherche à intégrer la composante émotionnelle dans les interactions homme-machine. Cependant, le toucher, qui est un puissant vecteur d'émotions, reste peu exploité. L'objectif de cette thèse est d'étudier les paramètres qui influencent l'expression et la perception des émotions dans la modalité kinesthésique dans un contexte d'interaction homme-machine. Dans un premier temps, cette thèse considère l'expression kinesthésique liée à la perception physique de forces et de mouvements d'un ensemble d'émotions actées sémantiquement proches. Sur la base des résultats de cette première étude, un couplage des expressions kinesthésiques typiques de différentes émotions avec des expressions faciales exprimées par un avatar est proposé afin d'étudier l'influence de la modalité kinesthésique dans la perception d'une expression multimodale d’émotion. Enfin, cette thèse va au-delà de ces émotions actées en abordant dans une dernière étude le cas de l'expression et de la perception d'un état affectif de stress spontané. Ces différentes expérimentations ont considéré différents dispositifs matériels pour la communication kinesthésique : des dispositifs de type Geomagic Touch ainsi qu'un dispositif de rendu de pression développé spécifiquement dans le cadre de cette thèse. Les résultats de ces travaux ont de multiples applications pratiques. Premièrement, une meilleure intégration de la modalité kinesthésique en contexte virtuel, qu'il s'agisse de communication humain-humain à distance ou d'immersion dans les jeux vidéo. Cette thèse ouvre également la voie à la détection automatique d'états affectifs exprimés spontanément par la modalité kinesthésique

Thèses en Ligne

Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition. Volume 2 : Traitement Automatique des Langues Naturelles

Author: Benzitoun Christophe
Braud Chloé
Huber Laurine
Langlois David
Ouni Slim
Pogodalla Sylvain
Schneider Stéphane
Publication venue: AFCP
Publication date: 01/01/2020
Field of study

@ 6ème conférence conjointe: JEP-TALN-RECITAL 2020no abstrac

INRIA a CCSD electronic archive server

Analysis of the interaction between elderly people and a simulated virtual coach

Author: Ben Letaifa Zouari Leila
Cordasco Gennaro
De Velasco Vázquez Mikel
Escalera Sergio
Esposito Anna
Fernández Ruanova Begoña
González Fraile Eduardo
Justo Blanco Raquel
Korsnes Maria
Palmero Cristina
Schlögl Stephan
Silva Micaela
Tenorio Laranga Joffre
Torp Johansen Anna
Torres Barañano María Inés
Vázquez Risco Alain
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2020
Field of study

The EMPATHIC project develops and validates new interaction paradigms for personalized virtual coaches (VC) to promote healthy and independent aging. To this end, the work presented in this paper is aimed to analyze the interaction between the EMPATHIC-VC and the users. One of the goals of the project is to ensure an end-user driven design, involving senior users from the beginning and during each phase of the project. Thus, the paper focuses on some sessions where the seniors carried out interactions with a Wizard of Oz driven, simulated system. A coaching strategy based on the GROW model was used throughout these sessions so as to guide interactions and engage the elderly with the goals of the project. In this interaction framework, both the human and the system behavior were analyzed. The way the wizard implements the GROW coaching strategy is a key aspect of the system behavior during the interaction. The language used by the virtual agent as well as his or her physical aspect are also important cues that were analyzed. Regarding the user behavior, the vocal communication provides information about the speaker's emotional status, that is closely related to human behavior and which can be extracted from the speech and language analysis. In the same way, the analysis of the facial expression, gazes and gestures can provide information on the non verbal human communication even when the user is not talking. In addition, in order to engage senior users, their preferences and likes had to be considered. To this end, the effect of the VC on the users was gathered by means of direct questionnaires. These analyses have shown a positive and calm behavior of users when interacting with the simulated virtual coach as well as some difficulties of the system to develop the proposed coaching strategy.The research presented in this paper is conducted as part of the project EMPATHIC that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no 769872

Archivo Digital para la Docencia y la Investigación