10 research outputs found
Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation
In speaker-independent speech emotion recognition, the training and testing
samples are collected from diverse speakers, leading to a multi-domain shift
challenge across the feature distributions of data from different speakers.
Consequently, when the trained model is confronted with data from new speakers,
its performance tends to degrade. To address the issue, we propose a Dynamic
Joint Distribution Adaptation (DJDA) method under the framework of multi-source
domain adaptation. DJDA firstly utilizes joint distribution adaptation (JDA),
involving marginal distribution adaptation (MDA) and conditional distribution
adaptation (CDA), to more precisely measure the multi-domain distribution
shifts caused by different speakers. This helps eliminate speaker bias in
emotion features, allowing for learning discriminative and speaker-invariant
speech emotion features from coarse-level to fine-level. Furthermore, we
quantify the adaptation contributions of MDA and CDA within JDA by using a
dynamic balance factor based on -Distance, promoting to
effectively handle the unknown distributions encountered in data from new
speakers. Experimental results demonstrate the superior performance of our DJDA
as compared to other state-of-the-art (SOTA) methods.Comment: Accepted by ICASSP 202
Adversarial Multimodal Representation Learning for Click-Through Rate Prediction
For better user experience and business effectiveness, Click-Through Rate
(CTR) prediction has been one of the most important tasks in E-commerce.
Although extensive CTR prediction models have been proposed, learning good
representation of items from multimodal features is still less investigated,
considering an item in E-commerce usually contains multiple heterogeneous
modalities. Previous works either concatenate the multiple modality features,
that is equivalent to giving a fixed importance weight to each modality; or
learn dynamic weights of different modalities for different items through
technique like attention mechanism. However, a problem is that there usually
exists common redundant information across multiple modalities. The dynamic
weights of different modalities computed by using the redundant information may
not correctly reflect the different importance of each modality. To address
this, we explore the complementarity and redundancy of modalities by
considering modality-specific and modality-invariant features differently. We
propose a novel Multimodal Adversarial Representation Network (MARN) for the
CTR prediction task. A multimodal attention network first calculates the
weights of multiple modalities for each item according to its modality-specific
features. Then a multimodal adversarial network learns modality-invariant
representations where a double-discriminators strategy is introduced. Finally,
we achieve the multimodal item representations by combining both
modality-specific and modality-invariant representations. We conduct extensive
experiments on both public and industrial datasets, and the proposed method
consistently achieves remarkable improvements to the state-of-the-art methods.
Moreover, the approach has been deployed in an operational E-commerce system
and online A/B testing further demonstrates the effectiveness.Comment: Accepted to WWW 2020, 10 page
Curriculum CycleGAN for Textual Sentiment Domain Adaptation with Multiple Sources
Sentiment analysis of user-generated reviews or comments on products and
services in social networks can help enterprises to analyze the feedback from
customers and take corresponding actions for improvement. To mitigate
large-scale annotations on the target domain, domain adaptation (DA) provides
an alternate solution by learning a transferable model from other labeled
source domains. Existing multi-source domain adaptation (MDA) methods either
fail to extract some discriminative features in the target domain that are
related to sentiment, neglect the correlations of different sources and the
distribution difference among different sub-domains even in the same source, or
cannot reflect the varying optimal weighting during different training stages.
In this paper, we propose a novel instance-level MDA framework, named
curriculum cycle-consistent generative adversarial network (C-CycleGAN), to
address the above issues. Specifically, C-CycleGAN consists of three
components: (1) pre-trained text encoder which encodes textual input from
different domains into a continuous representation space, (2) intermediate
domain generator with curriculum instance-level adaptation which bridges the
gap across source and target domains, and (3) task classifier trained on the
intermediate domain for final sentiment classification. C-CycleGAN transfers
source samples at instance-level to an intermediate domain that is closer to
the target domain with sentiment semantics preserved and without losing
discriminative features. Further, our dynamic instance-level weighting
mechanisms can assign the optimal weights to different source samples in each
training stage. We conduct extensive experiments on three benchmark datasets
and achieve substantial gains over state-of-the-art DA approaches. Our source
code is released at: https://github.com/WArushrush/Curriculum-CycleGAN.Comment: Accepted by WWW 202
Fault Diagnosis of Transfer Learning Equipment Based on Cloud Edge Collaboration + Confrontation Network
With the continuous improvement of product quality, production efficiency, and complexity, higher requirements are put forward for the reliability and stability of equipment, and the difficulty of real-time diagnosis of faults and functional failures is also increasing. The traditional fault diagnosis methods based on signal processing and Convolutional neural network cannot meet the requirements of on-site online real-time fault diagnosis of equipment. One is that the vibration signals on the industrial site are superimposed on each other, nonlinear and unstable and traditional feature extraction methods take a long time, resulting in unstable extraction results. Second, massive data and fault diagnosis algorithms need rich computing and storage resources. The traditional Convolutional neural network method conflicts with the real-time response requirements of fault diagnosis. At the same time, different models of fault diagnosis models have poor generalization ability, and the diagnostic accuracy is not high or even impossible to diagnose. To solve the above problems, this paper proposes a fault diagnosis method based on industrial Internet platform, which is equipment cloud edge collaboration + adaptive countermeasure network Transfer learning. On the edge side, the vibration signals collected from key components of the model are processed using empirical mode decomposition (EEMD) to solve the problem of signal nonlinearity and stationarity. In the cloud, EEMD signals of different models are decomposed into source domain and target domain for confrontation training, which is used as the input of the improved domain adversarial network model DANN (Domain Adversarial Neural Networks), so as to improve the accuracy of fault diagnosis of different models by using cloud computing power and the improved adversarial network Transfer learning algorithm. Through the analysis of experimental data, this paper verifies that the model after the confrontation network Transfer learning is more accurate than the traditional fault diagnosis method. Through the coordination of computing resources and real-time requirements, real-time diagnosis of cloud side collaborative bearing fault is realized
Domain Adaptation for Autonomous Driving
Metric based method is a promising approach to domain adaptation, which aims to align the marginal distribution of different domains with a similar conditional distribution. The thesis explores applying domain adaptation for autonomous driving and proposes domain adaptation methods for 2D image semantic segmentation and 3D point cloud object detection. For 2D image semantic segmentation domain adaptation, traditional approaches design metric function manually to measure the distance across domains. The adversarial methods can be considered as an automatic learning approach for the metric function. Instead of depending on the quality of metric function, this thesis outlines a generalized framework for domain randomization which first introduces moderate perturbation as randomness and then combines the advantage of metric-based domain adaptation and domain randomization. Then a simple-to-implement training pipeline of this framework is proposed, which proves that the proposed model achieves comparable performance with metric-based methods while having better generalization performance. The proposed Metric Guided Domain Randomization approach is able to improve mean intersection-over-union on the target domain from 16.9 to 27.2 without using any target domain data or annotations. 3D point cloud domain adaptation is an area where researchers pay little attention. A method based on global feature alignment is proposed and experiments show that it has a better performance compared with fine-tuning on target domain data directly when having access to a limited number of target data frames