4 research outputs found

    Emotion Recognition by Video: A review

    Full text link
    Video emotion recognition is an important branch of affective computing, and its solutions can be applied in different fields such as human-computer interaction (HCI) and intelligent medical treatment. Although the number of papers published in the field of emotion recognition is increasing, there are few comprehensive literature reviews covering related research on video emotion recognition. Therefore, this paper selects articles published from 2015 to 2023 to systematize the existing trends in video emotion recognition in related studies. In this paper, we first talk about two typical emotion models, then we talk about databases that are frequently utilized for video emotion recognition, including unimodal databases and multimodal databases. Next, we look at and classify the specific structure and performance of modern unimodal and multimodal video emotion recognition methods, talk about the benefits and drawbacks of each, and then we compare them in detail in the tables. Further, we sum up the primary difficulties right now looked by video emotion recognition undertakings and point out probably the most encouraging future headings, such as establishing an open benchmark database and better multimodal fusion strategys. The essential objective of this paper is to assist scholarly and modern scientists with keeping up to date with the most recent advances and new improvements in this speedy, high-influence field of video emotion recognition

    MM DialogueGAT- A Fusion Graph Attention Network for Emotion Recognition using Multi-model System

    Get PDF
    Emotion recognition is an important part of human-computer interaction and human communication information is multi-model. Despite advancements in emotion recognition models, certain challenges persist. The first problem pertains to the predominant focus in existing research on mining the interaction information between modes and the context information in the dialogue process but neglects to mine the role information between multi-model states and context information in the dialogue process. The second problem is in the context information of the dialogue where the information is not completely transmitted in a temporal structure. Aiming at these two problems, we propose a multi-model fusion dialogue graph attention network (MM DialogueGAT). To solve the problem 1, the bidirectional GRU mechanism is used to extract the information from each model. In the multi-model information fusion problem, different model configurations and different combinations use the cross-model multi-head attention mechanism to establish a multi-head attention layer. Text, video and audio information are used as the main and auxiliary modes for information fusion. To solve the problem 2, in the temporal context information extraction problem, the GAT graph structure is used to capture the context information in the mode. The results show that our model achieves good results using the IMEOCAP datasets

    Review and perspectives on driver digital twin and its enabling technologies for intelligent vehicles

    Get PDF
    Digital Twin (DT) is an emerging technology and has been introduced into intelligent driving and transportation systems to digitize and synergize connected automated vehicles. However, existing studies focus on the design of the automated vehicle, whereas the digitization of the human driver, who plays an important role in driving, is largely ignored. Furthermore, previous driver-related tasks are limited to specific scenarios and have limited applicability. Thus, a novel concept of a driver digital twin (DDT) is proposed in this study to bridge the gap between existing automated driving systems and fully digitized ones and aid in the development of a complete driving human cyber-physical system (H-CPS). This concept is essential for constructing a harmonious human-centric intelligent driving system that considers the proactivity and sensitivity of the human driver. The primary characteristics of the DDT include multimodal state fusion, personalized modeling, and time variance. Compared with the original DT, the proposed DDT emphasizes on internal personality and capability with respect to the external physiological-level state. This study systematically illustrates the DDT and outlines its key enabling aspects. The related technologies are comprehensively reviewed and discussed with a view to improving them by leveraging the DDT. In addition, the potential applications and unsettled challenges are considered. This study aims to provide fundamental theoretical support to researchers in determining the future scope of the DDT system
    corecore