158 research outputs found

    InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation

    Full text link
    Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.Comment: 5 pages, 4 figure

    GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion Detection

    Full text link
    Multimodal Emotion Recognition in Conversation (ERC) plays an influential role in the field of human-computer interaction and conversational robotics since it can motivate machines to provide empathetic services. Multimodal data modeling is an up-and-coming research area in recent years, which is inspired by human capability to integrate multiple senses. Several graph-based approaches claim to capture interactive information between modalities, but the heterogeneity of multimodal data makes these methods prohibit optimal solutions. In this work, we introduce a multimodal fusion approach named Graph and Attention based Two-stage Multi-source Information Fusion (GA2MIF) for emotion detection in conversation. Our proposed method circumvents the problem of taking heterogeneous graph as input to the model while eliminating complex redundant connections in the construction of graph. GA2MIF focuses on contextual modeling and cross-modal modeling through leveraging Multi-head Directed Graph ATtention networks (MDGATs) and Multi-head Pairwise Cross-modal ATtention networks (MPCATs), respectively. Extensive experiments on two public datasets (i.e., IEMOCAP and MELD) demonstrate that the proposed GA2MIF has the capacity to validly capture intra-modal long-range contextual information and inter-modal complementary information, as well as outperforms the prevalent State-Of-The-Art (SOTA) models by a remarkable margin.Comment: 14 page

    GraphMFT: A Graph Network based Multimodal Fusion Technique for Emotion Recognition in Conversation

    Full text link
    Multimodal machine learning is an emerging area of research, which has received a great deal of scholarly attention in recent years. Up to now, there are few studies on multimodal Emotion Recognition in Conversation (ERC). Since Graph Neural Networks (GNNs) possess the powerful capacity of relational modeling, they have an inherent advantage in the field of multimodal learning. GNNs leverage the graph constructed from multimodal data to perform intra- and inter-modal information interaction, which effectively facilitates the integration and complementation of multimodal data. In this work, we propose a novel Graph network based Multimodal Fusion Technique (GraphMFT) for emotion recognition in conversation. Multimodal data can be modeled as a graph, where each data object is regarded as a node, and both intra- and inter-modal dependencies existing between data objects can be regarded as edges. GraphMFT utilizes multiple improved graph attention networks to capture intra-modal contextual information and inter-modal complementary information. In addition, the proposed GraphMFT attempts to address the challenges of existing graph-based multimodal conversational emotion recognition models such as MMGCN. Empirical results on two public multimodal datasets reveal that our model outperforms the State-Of-The-Art (SOTA) approaches with the accuracy of 67.90% and 61.30%.Comment: Accepted by Neurocomputin

    GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition

    Full text link
    Emotion Recognition in Conversation (ERC) plays a significant part in Human-Computer Interaction (HCI) systems since it can provide empathetic services. Multimodal ERC can mitigate the drawbacks of uni-modal approaches. Recently, Graph Neural Networks (GNNs) have been widely used in a variety of fields due to their superior performance in relation modeling. In multimodal ERC, GNNs are capable of extracting both long-distance contextual information and inter-modal interactive information. Unfortunately, since existing methods such as MMGCN directly fuse multiple modalities, redundant information may be generated and diverse information may be lost. In this work, we present a directed Graph based Cross-modal Feature Complementation (GraphCFC) module that can efficiently model contextual and interactive information. GraphCFC alleviates the problem of heterogeneity gap in multimodal fusion by utilizing multiple subspace extractors and Pair-wise Cross-modal Complementary (PairCC) strategy. We extract various types of edges from the constructed graph for encoding, thus enabling GNNs to extract crucial contextual and interactive information more accurately when performing message passing. Furthermore, we design a GNN structure called GAT-MLP, which can provide a new unified network framework for multimodal learning. The experimental results on two benchmark datasets show that our GraphCFC outperforms the state-of-the-art (SOTA) approaches.Comment: 13 page

    SELM: Speech Enhancement Using Discrete Tokens and Language Models

    Full text link
    Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, harnessing semantic information holds potential advantages for speech enhancement tasks. In light of this, we propose SELM, a novel paradigm for speech enhancement, which integrates discrete tokens and leverages language models. SELM comprises three stages: encoding, modeling, and decoding. We transform continuous waveform signals into discrete tokens using pre-trained self-supervised learning (SSL) models and a k-means tokenizer. Language models then capture comprehensive contextual information within these tokens. Finally, a detokenizer and HiFi-GAN restore them into enhanced speech. Experimental results demonstrate that SELM achieves comparable performance in objective metrics alongside superior results in subjective perception. Our demos are available https://honee-w.github.io/SELM/.Comment: Accepted by ICASSP 202

    Baseline Demographic and Clinical Characteristics of Patients with Adrenal Incidentaloma from a Single Center in China: A Survey

    Get PDF
    Aim. To investigate the clinical and endocrinological characteristics of patients with adrenal incidentaloma (AI). Materials and Methods. This retrospective study enrolled 1941 AI patients hospitalized at the Department of Endocrinology, Chinese PLA General Hospital, Beijing, China, between January 1997 and December 2016. The patient gender, age at visits, imaging features, functional status, and histological results were analyzed. Results. Of the 1941 patients, 984 (50.70%) were men. The median age was 52 years (interquartile range: 44–69 years). 140 cases had bilateral AI. Endocrine evaluation showed that 1411 (72.69%) patients had nonfunctional tumor, 152 (7.83%) had subclinical Cushing syndrome (SCS), and 82 (4.33%) had primary hyperaldosteronism. A total of 925 patients underwent operation for removal of 496 cortical adenomas (53.62%), 15 adrenal cortical carcinomas (1.62%), and 172 pheochromocytomas (18.59%). The bilateral group had a higher proportion of SCS (18.57% versus 7.10%, P<0.001, P=0.006). A mass size of 46 mm was of great value in distinguishing malignant tumors from the benign tumors, with sensitivity of 88.2% and specificity of 95.5%. Conclusions. We reported the baseline demographic and clinical characteristics of patients with AI in a large series from a single center in China

    Neutron generation enhanced by a femtosecond laser irradiating on multi-channel target

    Get PDF
    A novel scheme has been proposed to enhance neutron yields, in which a multi-channel target consisting of a row of parallel micro-wires and a plane substrate is irradiated by a relativistic femtosecond laser. Two-dimensional particle-in-cell simulations show that the multi-channel target can significantly enhance the neutron yield, which is about 4 orders of magnitude greater than the plane target. Different from the case of nanowire target, we find that when the laser penetrates into the channel, the excited transverse sheath electric field can effectively accelerate the D+ ions in the transverse direction. When these energetic D+ ions move towards the nearby wire, they will collide with the bulk D+ ions to trigger D-D fusion reaction and produce neutrons, which is much more effective than the plane target case. Due to the unique trajectory of the incident D+ ions, the angular distribution of the produced neutrons is modulated from isotropic to two peaks around ±90°. Meanwhile, this enhancement and modulation is further verified in a wide range of target parameters
    • …
    corecore