10 research outputs found

    Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

    Full text link
    In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers. Consequently, when the trained model is confronted with data from new speakers, its performance tends to degrade. To address the issue, we propose a Dynamic Joint Distribution Adaptation (DJDA) method under the framework of multi-source domain adaptation. DJDA firstly utilizes joint distribution adaptation (JDA), involving marginal distribution adaptation (MDA) and conditional distribution adaptation (CDA), to more precisely measure the multi-domain distribution shifts caused by different speakers. This helps eliminate speaker bias in emotion features, allowing for learning discriminative and speaker-invariant speech emotion features from coarse-level to fine-level. Furthermore, we quantify the adaptation contributions of MDA and CDA within JDA by using a dynamic balance factor based on A\mathcal{A}-Distance, promoting to effectively handle the unknown distributions encountered in data from new speakers. Experimental results demonstrate the superior performance of our DJDA as compared to other state-of-the-art (SOTA) methods.Comment: Accepted by ICASSP 202

    Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

    Full text link
    For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.Comment: Accepted to WWW 2020, 10 page

    Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources

    Full text link
    Sentiment analysis of user-generated reviews or comments on products and services in social networks can help enterprises to analyze the feedback from customers and take corresponding actions for improvement. To mitigate large-scale annotations on the target domain, domain adaptation (DA) provides an alternate solution by learning a transferable model from other labeled source domains. Existing multi-source domain adaptation (MDA) methods either fail to extract some discriminative features in the target domain that are related to sentiment, neglect the correlations of different sources and the distribution difference among different sub-domains even in the same source, or cannot reflect the varying optimal weighting during different training stages. In this paper, we propose a novel instance-level MDA framework, named curriculum cycle-consistent generative adversarial network (C-CycleGAN), to address the above issues. Specifically, C-CycleGAN consists of three components: (1) pre-trained text encoder which encodes textual input from different domains into a continuous representation space, (2) intermediate domain generator with curriculum instance-level adaptation which bridges the gap across source and target domains, and (3) task classifier trained on the intermediate domain for final sentiment classification. C-CycleGAN transfers source samples at instance-level to an intermediate domain that is closer to the target domain with sentiment semantics preserved and without losing discriminative features. Further, our dynamic instance-level weighting mechanisms can assign the optimal weights to different source samples in each training stage. We conduct extensive experiments on three benchmark datasets and achieve substantial gains over state-of-the-art DA approaches. Our source code is released at: https://github.com/WArushrush/Curriculum-CycleGAN.Comment: Accepted by WWW 202

    Fault Diagnosis of Transfer Learning Equipment Based on Cloud Edge Collaboration + Confrontation Network

    Get PDF
    With the continuous improvement of product quality, production efficiency, and complexity, higher requirements are put forward for the reliability and stability of equipment, and the difficulty of real-time diagnosis of faults and functional failures is also increasing. The traditional fault diagnosis methods based on signal processing and Convolutional neural network cannot meet the requirements of on-site online real-time fault diagnosis of equipment. One is that the vibration signals on the industrial site are superimposed on each other, nonlinear and unstable and traditional feature extraction methods take a long time, resulting in unstable extraction results. Second, massive data and fault diagnosis algorithms need rich computing and storage resources. The traditional Convolutional neural network method conflicts with the real-time response requirements of fault diagnosis. At the same time, different models of fault diagnosis models have poor generalization ability, and the diagnostic accuracy is not high or even impossible to diagnose. To solve the above problems, this paper proposes a fault diagnosis method based on industrial Internet platform, which is equipment cloud edge collaboration + adaptive countermeasure network Transfer learning. On the edge side, the vibration signals collected from key components of the model are processed using empirical mode decomposition (EEMD) to solve the problem of signal nonlinearity and stationarity. In the cloud, EEMD signals of different models are decomposed into source domain and target domain for confrontation training, which is used as the input of the improved domain adversarial network model DANN (Domain Adversarial Neural Networks), so as to improve the accuracy of fault diagnosis of different models by using cloud computing power and the improved adversarial network Transfer learning algorithm. Through the analysis of experimental data, this paper verifies that the model after the confrontation network Transfer learning is more accurate than the traditional fault diagnosis method. Through the coordination of computing resources and real-time requirements, real-time diagnosis of cloud side collaborative bearing fault is realized

    Domain Adaptation for Autonomous Driving

    Get PDF
    Metric based method is a promising approach to domain adaptation, which aims to align the marginal distribution of different domains with a similar conditional distribution. The thesis explores applying domain adaptation for autonomous driving and proposes domain adaptation methods for 2D image semantic segmentation and 3D point cloud object detection. For 2D image semantic segmentation domain adaptation, traditional approaches design metric function manually to measure the distance across domains. The adversarial methods can be considered as an automatic learning approach for the metric function. Instead of depending on the quality of metric function, this thesis outlines a generalized framework for domain randomization which first introduces moderate perturbation as randomness and then combines the advantage of metric-based domain adaptation and domain randomization. Then a simple-to-implement training pipeline of this framework is proposed, which proves that the proposed model achieves comparable performance with metric-based methods while having better generalization performance. The proposed Metric Guided Domain Randomization approach is able to improve mean intersection-over-union on the target domain from 16.9 to 27.2 without using any target domain data or annotations. 3D point cloud domain adaptation is an area where researchers pay little attention. A method based on global feature alignment is proposed and experiments show that it has a better performance compared with fine-tuning on target domain data directly when having access to a limited number of target data frames
    corecore