166 research outputs found

    Semi-Supervised Speech Emotion Recognition with Ladder Networks

    Full text link
    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of ladder networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that ladder networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture

    A multimodal emotion detection system during human-robot interaction

    Get PDF
    In this paper, a multimodal user-emotion detection system for social robots is presented. This system is intended to be used during human-robot interaction, and it is integrated as part of the overall interaction system of the robot: the Robotics Dialog System (RDS). Two modes are used to detect emotions: the voice and face expression analysis. In order to analyze the voice of the user, a new component has been developed: Gender and Emotion Voice Analysis (GEVA), which is written using the Chuck language. For emotion detection in facial expressions, the system, Gender and Emotion Facial Analysis (GEFA), has been also developed. This last system integrates two third-party solutions: Sophisticated High-speed Object Recognition Engine (SHORE) and Computer Expression Recognition Toolbox (CERT). Once these new components (GEVA and GEFA) give their results, a decision rule is applied in order to combine the information given by both of them. The result of this rule, the detected emotion, is integrated into the dialog system through communicative acts. Hence, each communicative act gives, among other things, the detected emotion of the user to the RDS so it can adapt its strategy in order to get a greater satisfaction degree during the human-robot dialog. Each of the new components, GEVA and GEFA, can also be used individually. Moreover, they are integrated with the robotic control platform ROS (Robot Operating System). Several experiments with real users were performed to determine the accuracy of each component and to set the final decision rule. The results obtained from applying this decision rule in these experiments show a high success rate in automatic user emotion recognition, improving the results given by the two information channels (audio and visual) separately.The authors gratefully acknowledge the funds provided by the Spanish MICINN (Ministry of Science and Innovation) through the project “Aplicaciones de los robots sociales”, DPI2011-26980 from the Spanish Ministry of Economy and Competitiveness. Moreover, the research leading to these results has received funding from the RoboCity2030-II-CM project (S2009/DPI-1559), funded by Programas de Actividades I+D en la Comunidad de Madrid and cofunded by Structural Funds of the EU

    Muecas: a multi-sensor robotic head for affective human robot interaction and imitation

    Get PDF
    Este artículo presenta una cabeza robótica humanoide multi-sensor para la interacción del robot humano. El diseño de la cabeza robótica, Muecas, se basa en la investigación en curso sobre los mecanismos de percepción e imitación de las expresiones y emociones humanas. Estos mecanismos permiten la interacción directa entre el robot y su compañero humano a través de las diferentes modalidades del lenguaje natural: habla, lenguaje corporal y expresiones faciales. La cabeza robótica tiene 12 grados de libertad, en una configuración de tipo humano, incluyendo ojos, cejas, boca y cuello, y ha sido diseñada y construida totalmente por IADeX (Ingeniería, Automatización y Diseño de Extremadura) y RoboLab. Se proporciona una descripción detallada de su cinemática junto con el diseño de los controladores más complejos. Muecas puede ser controlado directamente por FACS (Sistema de Codificación de Acción Facial), el estándar de facto para reconocimiento y síntesis de expresión facial. Esta característica facilita su uso por parte de plataformas de terceros y fomenta el desarrollo de la imitación y de los sistemas basados en objetivos. Los sistemas de imitación aprenden del usuario, mientras que los basados en objetivos utilizan técnicas de planificación para conducir al usuario hacia un estado final deseado. Para mostrar la flexibilidad y fiabilidad de la cabeza robótica, se presenta una arquitectura de software capaz de detectar, reconocer, clasificar y generar expresiones faciales en tiempo real utilizando FACS. Este sistema se ha implementado utilizando la estructura robótica, RoboComp, que proporciona acceso independiente al hardware a los sensores en la cabeza. Finalmente, se presentan resultados experimentales que muestran el funcionamiento en tiempo real de todo el sistema, incluyendo el reconocimiento y la imitación de las expresiones faciales humanas.This paper presents a multi-sensor humanoid robotic head for human robot interaction. The design of the robotic head, Muecas, is based on ongoing research on the mechanisms of perception and imitation of human expressions and emotions. These mechanisms allow direct interaction between the robot and its human companion through the different natural language modalities: speech, body language and facial expressions. The robotic head has 12 degrees of freedom, in a human-like configuration, including eyes, eyebrows, mouth and neck, and has been designed and built entirely by IADeX (Engineering, Automation and Design of Extremadura) and RoboLab. A detailed description of its kinematics is provided along with the design of the most complex controllers. Muecas can be directly controlled by FACS (Facial Action Coding System), the de facto standard for facial expression recognition and synthesis. This feature facilitates its use by third party platforms and encourages the development of imitation and of goal-based systems. Imitation systems learn from the user, while goal-based ones use planning techniques to drive the user towards a final desired state. To show the flexibility and reliability of the robotic head, the paper presents a software architecture that is able to detect, recognize, classify and generate facial expressions in real time using FACS. This system has been implemented using the robotics framework, RoboComp, which provides hardware-independent access to the sensors in the head. Finally, the paper presents experimental results showing the real-time functioning of the whole system, including recognition and imitation of human facial expressions.Trabajo financiado por: Ministerio de Ciencia e Innovación. Proyecto TIN2012-38079-C03-1 Gobierno de Extremadura. Proyecto GR10144peerReviewe

    Interim research assessment 2003-2005 - Computer Science

    Get PDF
    This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

    Design and evaluation of adaptive multimoldal systems

    Get PDF
    Tese de doutoramento em Informática (Engenharia Informática), presentada à Universidade de Lisboa através da Faculdade de Ciências, 2008This thesis focuses on the design and evaluation of adaptive multi-modal systems. The design of such systems is approached from an integrated perspective, with the goal of obtaining a solution where aspects related to both adaptive and multimodal systems are considered. The result is FAME, a model based framework for the design and development of adaptive multimodal systems, where adaptive capabilities impact directly over the process of multimodal fusion and fission operations. FAME over views the design of systems capable of adapting to a diversified context, including variations in users,execution platform, and environment. FAME represents an evolution from previous frameworks by incorporating aspects specific to multimodal interfaces directly in the development of an adaptive platform. One of FAME's components is the Behavioral Matrix, a multi purpose instrument, used during the design phase to represent the adaptation rules. In addition, the Behavioral Matrix is also the component responsible for bridging the gap between design and evaluation stages. Departing from an analogy between transitionnet works for representing interaction with a system, and behavioral spaces, the Behavioral Matrix makes possible the application of behavioral complexity metrics to general adaptive systems. Moreover,this evaluation is possible during the design stages,which translates into a reduction of there sources required for evaluation of adaptive systems.The Behavior al Matrix allows a designer to emulate the behavior of anon-adaptiveversionoftheadaptivesystem,allowing for comparison of the versions, one of the most used approaches to adaptive systems evaluation. In addition, the designer may also emulate the behavior of different user profiles and compare their complexity measures. The feasibility of FAME was demonstrated with the development of an adaptive multimodal Digital Book Player. The process was successful, as demonstrated by usability evaluations. Besides these evaluations, behavioral complexity metrics, computed in accordance with the proposed methodology, were able to discern between adaptive and non-adaptive versions of the player. When applied to user profiles of different perceived complexity, the metrics were also able to detect the different interaction complexity.FCT - IPSOM (POSI/PLP/34252/2000) e RiCoBA (POSC/EIA/61042/2004

    AI in Learning: Designing the Future

    Get PDF
    AI (Artificial Intelligence) is predicted to radically change teaching and learning in both schools and industry causing radical disruption of work. AI can support well-being initiatives and lifelong learning but educational institutions and companies need to take the changing technology into account. Moving towards AI supported by digital tools requires a dramatic shift in the concept of learning, expertise and the businesses built off of it. Based on the latest research on AI and how it is changing learning and education, this book will focus on the enormous opportunities to expand educational settings with AI for learning in and beyond the traditional classroom. This open access book also introduces ethical challenges related to learning and education, while connecting human learning and machine learning. This book will be of use to a variety of readers, including researchers, AI users, companies and policy makers

    A Proactive Approach of Robotic Framework for Making Eye Contact with Humans

    Get PDF
    Making eye contact is a most important prerequisite function of humans to initiate a conversation with others. However, it is not an easy task for a robot to make eye contact with a human if they are not facing each other initially or the human is intensely engaged his/her task. If the robot would like to start communication with a particular person, it should turn its gaze to that person and make eye contact with him/her. However, such a turning action alone is not enough to set up an eye contact phenomenon in all cases. Therefore, the robot should perform some stronger actions in some situations so that it can attract the target person before meeting his/her gaze. In this paper, we proposed a conceptual model of eye contact for social robots consisting of two phases: capturing attention and ensuring the attention capture. Evaluation experiments with human participants reveal the effectiveness of the proposed model in four viewing situations, namely, central field of view, near peripheral field of view, far peripheral field of view, and out of field of view

    AI in Learning: Designing the Future

    Get PDF
    AI (Artificial Intelligence) is predicted to radically change teaching and learning in both schools and industry causing radical disruption of work. AI can support well-being initiatives and lifelong learning but educational institutions and companies need to take the changing technology into account. Moving towards AI supported by digital tools requires a dramatic shift in the concept of learning, expertise and the businesses built off of it. Based on the latest research on AI and how it is changing learning and education, this book will focus on the enormous opportunities to expand educational settings with AI for learning in and beyond the traditional classroom. This open access book also introduces ethical challenges related to learning and education, while connecting human learning and machine learning. This book will be of use to a variety of readers, including researchers, AI users, companies and policy makers
    corecore