123,328 research outputs found

    A Contextualized Real-Time Multimodal Emotion Recognition for Conversational Agents using Graph Convolutional Networks in Reinforcement Learning

    Full text link
    Owing to the recent developments in Generative Artificial Intelligence (GenAI) and Large Language Models (LLM), conversational agents are becoming increasingly popular and accepted. They provide a human touch by interacting in ways familiar to us and by providing support as virtual companions. Therefore, it is important to understand the user's emotions in order to respond considerately. Compared to the standard problem of emotion recognition, conversational agents face an additional constraint in that recognition must be real-time. Studies on model architectures using audio, visual, and textual modalities have mainly focused on emotion classification using full video sequences that do not provide online features. In this work, we present a novel paradigm for contextualized Emotion Recognition using Graph Convolutional Network with Reinforcement Learning (conER-GRL). Conversations are partitioned into smaller groups of utterances for effective extraction of contextual information. The system uses Gated Recurrent Units (GRU) to extract multimodal features from these groups of utterances. More importantly, Graph Convolutional Networks (GCN) and Reinforcement Learning (RL) agents are cascade trained to capture the complex dependencies of emotion features in interactive scenarios. Comparing the results of the conER-GRL model with other state-of-the-art models on the benchmark dataset IEMOCAP demonstrates the advantageous capabilities of the conER-GRL architecture in recognizing emotions in real-time from multimodal conversational signals.Comment: 5 pages (4 main + 1 reference), 2 figures. Submitted to IEEE FG202

    A Dynamic Approach to Pose Invariant Face Identification Using Cellular Simultaneous Recurrent Networks

    Get PDF
    Face recognition is a widely covered and desirable research field that produced multiple techniques and different approaches. Most of them have severe limitations with pose variations or face rotation. The immediate goal of this thesis is to deal with pose variations by implementing a face recognition system using a Cellular Simultaneous Recurrent Network (CSRN). The CSRN is a novel bio-inspired recurrent neural network that mimics reinforcement learning in the brain. The recognition task is defined as an identification problem on image sequences. The goal is to correctly match a set of unknown pose distorted probe face sequences with a set of known gallery sequences. This system comprises of a pre-processing stage for face and feature extraction and a recognition stage to perform the identification. The face detection algorithm is based on the scale-space method combined with facial structural knowledge. These steps include extraction of key landmark points and motion unit vectors that describe movement of face sequqnces. The identification process applies Eigenface and PCA and reduces each image to a pattern vector used as input for the CSRN. In the training phase the CSRN learns the temporal information contained in image sequences. In the testing phase the network predicts the output pattern and finds similarity with a test input pattern indicating a match or mismatch.Previous applications of a CSRN system in face recognition have shown promise. The first objective of this research is to evaluate those prior implementations of CSRN-based pose invariant face recognition in video images with large scale databases. The publicly available VidTIMIT Audio-Video face dataset provides all the sequences needed for this study. The second objective is to modify a few well know standard face recognition algorithms to handle pose invariant face recognition for appropriate benchmarking with the CSRN. The final objective is to further improve CSRN face recognition by introducing motion units which can be used to capture the direction and intensity of movement of feature points in a rotating fac

    Dynamic Face Video Segmentation via Reinforcement Learning

    Full text link
    For real-time semantic video segmentation, most recent works utilised a dynamic framework with a key scheduler to make online key/non-key decisions. Some works used a fixed key scheduling policy, while others proposed adaptive key scheduling methods based on heuristic strategies, both of which may lead to suboptimal global performance. To overcome this limitation, we model the online key decision process in dynamic video segmentation as a deep reinforcement learning problem and learn an efficient and effective scheduling policy from expert information about decision history and from the process of maximising global return. Moreover, we study the application of dynamic video segmentation on face videos, a field that has not been investigated before. By evaluating on the 300VW dataset, we show that the performance of our reinforcement key scheduler outperforms that of various baselines in terms of both effective key selections and running speed. Further results on the Cityscapes dataset demonstrate that our proposed method can also generalise to other scenarios. To the best of our knowledge, this is the first work to use reinforcement learning for online key-frame decision in dynamic video segmentation, and also the first work on its application on face videos.Comment: CVPR 2020. 300VW with segmentation labels is available at: https://github.com/mapleandfire/300VW-Mas

    Attention-Aware Face Hallucination via Deep Reinforcement Learning

    Full text link
    Face hallucination is a domain-specific super-resolution problem with the goal to generate high-resolution (HR) faces from low-resolution (LR) input images. In contrast to existing methods that often learn a single patch-to-patch mapping from LR to HR images and are regardless of the contextual interdependency between patches, we propose a novel Attention-aware Face Hallucination (Attention-FH) framework which resorts to deep reinforcement learning for sequentially discovering attended patches and then performing the facial part enhancement by fully exploiting the global interdependency of the image. Specifically, in each time step, the recurrent policy network is proposed to dynamically specify a new attended region by incorporating what happened in the past. The state (i.e., face hallucination result for the whole image) can thus be exploited and updated by the local enhancement network on the selected region. The Attention-FH approach jointly learns the recurrent policy network and local enhancement network through maximizing the long-term reward that reflects the hallucination performance over the whole image. Therefore, our proposed Attention-FH is capable of adaptively personalizing an optimal searching path for each face image according to its own characteristic. Extensive experiments show our approach significantly surpasses the state-of-the-arts on in-the-wild faces with large pose and illumination variations
    • …
    corecore