3 research outputs found
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Multilingual sequence labeling is a task of predicting label sequences using
a single unified model for multiple languages. Compared with relying on
multiple monolingual models, using a multilingual model has the benefit of a
smaller model size, easier in online serving, and generalizability to
low-resource languages. However, current multilingual models still underperform
individual monolingual models significantly due to model capacity limitations.
In this paper, we propose to reduce the gap between monolingual models and the
unified multilingual model by distilling the structural knowledge of several
monolingual models (teachers) to the unified multilingual model (student). We
propose two novel KD methods based on structure-level information: (1)
approximately minimizes the distance between the student's and the teachers'
structure level probability distributions, (2) aggregates the structure-level
knowledge to local distributions and minimizes the distance between two local
probability distributions. Our experiments on 4 multilingual tasks with 25
datasets show that our approaches outperform several strong baselines and have
stronger zero-shot generalizability than both the baseline model and teacher
models.Comment: Accepted to ACL 2020, camera-ready. 14 page
CalibreNet: Calibration Networks for Multilingual Sequence Labeling
Lack of training data in low-resource languages presents huge challenges to
sequence labeling tasks such as named entity recognition (NER) and machine
reading comprehension (MRC). One major obstacle is the errors on the boundary
of predicted answers. To tackle this problem, we propose CalibreNet, which
predicts answers in two steps. In the first step, any existing sequence
labeling method can be adopted as a base model to generate an initial answer.
In the second step, CalibreNet refines the boundary of the initial answer. To
tackle the challenge of lack of training data in low-resource languages, we
dedicatedly develop a novel unsupervised phrase boundary recovery pre-training
task to enhance the multilingual boundary detection capability of CalibreNet.
Experiments on two cross-lingual benchmark datasets show that the proposed
approach achieves SOTA results on zero-shot cross-lingual NER and MRC tasks.Comment: Long paper in WSDM 202
Automated Concatenation of Embeddings for Structured Prediction
Pretrained contextualized embeddings are powerful word representations for
structured prediction tasks. Recent work found that better word representations
can be obtained by concatenating different types of embeddings. However, the
selection of embeddings to form the best concatenated representation usually
varies depending on the task and the collection of candidate embeddings, and
the ever-increasing number of embedding types makes it a more difficult
problem. In this paper, we propose Automated Concatenation of Embeddings (ACE)
to automate the process of finding better concatenations of embeddings for
structured prediction tasks, based on a formulation inspired by recent progress
on neural architecture search. Specifically, a controller alternately samples a
concatenation of embeddings, according to its current belief of the
effectiveness of individual embedding types in consideration for a task, and
updates the belief based on a reward. We follow strategies in reinforcement
learning to optimize the parameters of the controller and compute the reward
based on the accuracy of a task model, which is fed with the sampled
concatenation as input and trained on a task dataset. Empirical results on 6
tasks and 21 datasets show that our approach outperforms strong baselines and
achieves state-of-the-art performance with fine-tuned embeddings in all the
evaluations.Comment: Accepted to Proceedings of ACL-IJCNLP 2021. 17 page