2,031 research outputs found
Cross-Domain Aspect Extraction using Adversarial Domain Adaptation
Aspect extraction, the task of identifying and categorizing aspects or features in text, plays a crucial role in sentiment analysis. However, aspect extraction models often struggle to generalize well across different domains due to domain-specific language patterns and variations. In order to tackle this challenge, we propose an approach called "Cross-Domain Aspect Extraction using Adversarial-Based Domain Adaptation". Our model combines the power of pre-trained language models, such as BERT, with adversarial training techniques to enable effective aspect extraction in diverse domains. The model learns to extract domain-invariant aspects by incorporating a domain discriminator, making it adaptable to different domains. We evaluate our model on datasets from multiple domains and demonstrate its effectiveness in achieving cross-domain aspect extraction. The results of our experiments reveal that our model outperforms baseline techniques, resulting in significant gains in aspect extraction across various domains. Our approach opens new possibilities for domain adaptation in aspect extraction tasks, providing valuable insights for sentiment analysis in diverse domains
CausaLM: Causal Model Explanation Through Counterfactual Language Models
Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all ML-based methods,
they are as good as their training data, and can also capture unwanted biases.
While there are tools that can help understand whether such biases exist, they
do not distinguish between correlation and causation, and might be ill-suited
for text-based models and for reasoning about high level language concepts. A
key problem of estimating the causal effect of a concept of interest on a given
model is that this estimation requires the generation of counterfactual
examples, which is challenging with existing generation technology. To bridge
that gap, we propose CausaLM, a framework for producing causal model
explanations using counterfactual language representation models. Our approach
is based on fine-tuning of deep contextualized embedding models with auxiliary
adversarial tasks derived from the causal graph of the problem. Concretely, we
show that by carefully choosing auxiliary adversarial pre-training tasks,
language representation models such as BERT can effectively learn a
counterfactual representation for a given concept of interest, and be used to
estimate its true causal effect on model performance. A byproduct of our method
is a language representation model that is unaffected by the tested concept,
which can be useful in mitigating unwanted bias ingrained in the data.Comment: Our code and data are available at:
https://amirfeder.github.io/CausaLM/ Under review for the Computational
Linguistics journa
Transformer Based Multi-Source Domain Adaptation
In practical machine learning settings, the data on which a model must make
predictions often come from a different distribution than the data it was
trained on. Here, we investigate the problem of unsupervised multi-source
domain adaptation, where a model is trained on labelled data from multiple
source domains and must make predictions on a domain for which no labelled data
has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of
mixture of experts, where the predictions of multiple domain expert classifiers
are combined; as well as domain adversarial training, to induce a domain
agnostic representation space. Inspired by this, we investigate how such
methods can be effectively applied to large pretrained transformer models. We
find that domain adversarial training has an effect on the learned
representations of these models while having little effect on their
performance, suggesting that large transformer-based models are already
relatively robust across domains. Additionally, we show that mixture of experts
leads to significant performance improvements by comparing several variants of
mixing functions, including one novel mixture based on attention. Finally, we
demonstrate that the predictions of large pretrained transformer based domain
experts are highly homogenous, making it challenging to learn effective
functions for mixing their predictions.Comment: 12 pages, 3 figures, 5 table
Reducing Spurious Correlations for Aspect-Based Sentiment Analysis with Variational Information Bottleneck and Contrastive Learning
Deep learning techniques have dominated the literature on aspect-based
sentiment analysis (ABSA), yielding state-of-the-art results. However, these
deep models generally suffer from spurious correlation problems between input
features and output labels, which creates significant barriers to robustness
and generalization capability. In this paper, we propose a novel Contrastive
Variational Information Bottleneck framework (called CVIB) to reduce spurious
correlations for ABSA. The proposed CVIB framework is composed of an original
network and a self-pruned network, and these two networks are optimized
simultaneously via contrastive learning. Concretely, we employ the Variational
Information Bottleneck (VIB) principle to learn an informative and compressed
network (self-pruned network) from the original network, which discards the
superfluous patterns or spurious correlations between input features and
prediction labels. Then, self-pruning contrastive learning is devised to pull
together semantically similar positive pairs and push away dissimilar pairs,
where the representations of the anchor learned by the original and self-pruned
networks respectively are regarded as a positive pair while the representations
of two different sentences within a mini-batch are treated as a negative pair.
To verify the effectiveness of our CVIB method, we conduct extensive
experiments on five benchmark ABSA datasets and the experimental results show
that our approach achieves better performance than the strong competitors in
terms of overall prediction performance, robustness, and generalization
A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification
Cross-domain text classification aims to adapt models to a target domain that
lacks labeled data. It leverages or reuses rich labeled data from the different
but related source domain(s) and unlabeled data from the target domain. To this
end, previous work focuses on either extracting domain-invariant features or
task-agnostic features, ignoring domain-aware features that may be present in
the target domain and could be useful for the downstream task. In this paper,
we propose a two-stage framework for cross-domain text classification. In the
first stage, we finetune the model with mask language modeling (MLM) and
labeled data from the source domain. In the second stage, we further fine-tune
the model with self-supervised distillation (SSD) and unlabeled data from the
target domain. We evaluate its performance on a public cross-domain text
classification benchmark and the experiment results show that our method
achieves new state-of-the-art results for both single-source domain adaptations
(94.17% 1.03%) and multi-source domain adaptations (95.09%
1.34%)
- …