87 research outputs found
SLUA: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning
Word alignment is essential for the down-streaming cross-lingual language
understanding and generation tasks. Recently, the performance of the neural
word alignment models has exceeded that of statistical models. However, they
heavily rely on sophisticated translation models. In this study, we propose a
super lightweight unsupervised word alignment (SLUA) model, in which
bidirectional symmetric attention trained with a contrastive learning objective
is introduced, and an agreement loss is employed to bind the attention maps,
such that the alignments follow mirror-like symmetry hypothesis. Experimental
results on several public benchmarks demonstrate that our model achieves
competitive, if not better, performance compared to the state of the art in
word alignment while significantly reducing the training and decoding time on
average. Further ablation analysis and case studies show the superiority of our
proposed SLUA. Notably, we recognize our model as a pioneer attempt to unify
bilingual word embedding and word alignments. Encouragingly, our approach
achieves 16.4x speedup against GIZA++, and 50x parameter compression} compared
with the Transformer-based alignment methods. We will release our code to
facilitate the community.Comment: Work in progres
A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis
Aspect-Based Sentiment Analysis is a fine-grained sentiment analysis task,
which focuses on detecting the sentiment polarity towards the aspect in a
sentence. However, it is always sensitive to the multi-aspect challenge, where
features of multiple aspects in a sentence will affect each other. To mitigate
this issue, we design a novel training framework, called Contrastive
Cross-Channel Data Augmentation (C3DA). A source sentence will be fed a
domain-specific generator to obtain some synthetic sentences and is
concatenated with these generated sentences to conduct supervised training and
proposed contrastive training. To be specific, considering the limited ABSA
labeled data, we also introduce some parameter-efficient approaches to complete
sentences generation. This novel generation method consists of an Aspect
Augmentation Channel (AAC) to generate aspect-specific sentences and a Polarity
Augmentation (PAC) to generate polarity-inverted sentences. According to our
extensive experiments, our C3DA framework can outperform those baselines
without any augmentations by about 1\% on accuracy and Macro-F1
FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
Federated learning is an emerging distributed machine learning framework
which jointly trains a global model via a large number of local devices with
data privacy protections. Its performance suffers from the non-vanishing biases
introduced by the local inconsistent optimal and the rugged client-drifts by
the local over-fitting. In this paper, we propose a novel and practical method,
FedSpeed, to alleviate the negative impacts posed by these problems.
Concretely, FedSpeed applies the prox-correction term on the current local
updates to efficiently reduce the biases introduced by the prox-term, a
necessary regularizer to maintain the strong local consistency. Furthermore,
FedSpeed merges the vanilla stochastic gradient with a perturbation computed
from an extra gradient ascent step in the neighborhood, thereby alleviating the
issue of local over-fitting. Our theoretical analysis indicates that the
convergence rate is related to both the communication rounds and local
intervals with a upper bound if setting a proper
local interval. Moreover, we conduct extensive experiments on the real-world
dataset to demonstrate the efficiency of our proposed FedSpeed, which performs
significantly faster and achieves the state-of-the-art (SOTA) performance on
the general FL experimental settings than several baselines. Our code is
available at \url{https://github.com/woodenchild95/FL-Simulator.git}.Comment: ICLR 202
Self-Evolution Learning for Discriminative Language Model Pretraining
Masked language modeling, widely used in discriminative language model (e.g.,
BERT) pretraining, commonly adopts a random masking strategy. However, random
masking does not consider the importance of the different words in the sentence
meaning, where some of them are more worthy to be predicted. Therefore, various
masking strategies (e.g., entity-level masking) are proposed, but most of them
require expensive prior knowledge and generally train from scratch without
reusing existing model weights. In this paper, we present Self-Evolution
learning (SE), a simple and effective token masking and learning method to
fully and wisely exploit the knowledge from data. SE focuses on learning the
informative yet under-explored tokens and adaptively regularizes the training
by introducing a novel Token-specific Label Smoothing approach. Experiments on
10 tasks show that our SE brings consistent and significant improvements
(+1.43~2.12 average scores) upon different PLMs. In-depth analyses demonstrate
that SE improves linguistic knowledge learning and generalization.Comment: Accepted to Findings of ACL202
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation
For multilingual sequence-to-sequence pretrained language models
(multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task
is trained on a wide range of monolingual languages, e.g. 25 languages from
commoncrawl, while the downstream cross-lingual tasks generally progress on a
bilingual language subset, e.g. English-German, making there exists the
cross-lingual data discrepancy, namely \textit{domain discrepancy}, and
cross-lingual learning objective discrepancy, namely \textit{task discrepancy},
between the pretrain and finetune stages. To bridge the above cross-lingual
domain and task gaps, we extend the vanilla pretrain-finetune pipeline with
extra code-switching restore task. Specifically, the first stage employs the
self-supervised code-switching restore task as a pretext task, allowing the
multilingual Seq2Seq PLM to acquire some in-domain alignment information. And
for the second stage, we continuously fine-tune the model on labeled data
normally. Experiments on a variety of cross-lingual NLG tasks, including 12
bilingual translation tasks, 36 zero-shot translation tasks, and cross-lingual
summarization tasks show our model outperforms the strong baseline mBART
consistently. Comprehensive analyses indicate our approach could narrow the
cross-lingual sentence representation distance and improve low-frequency word
translation with trivial computational cost
- …