Search CORE

3 research outputs found

Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

Author: Bach Nguyen
Huang Fei
Jiang Yong
Tu Kewei
Wang Tao
Wang Xinyu
Publication venue
Publication date: 04/05/2020
Field of study

Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual models significantly due to model capacity limitations. In this paper, we propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models (teachers) to the unified multilingual model (student). We propose two novel KD methods based on structure-level information: (1) approximately minimizes the distance between the student's and the teachers' structure level probability distributions, (2) aggregates the structure-level knowledge to local distributions and minimizes the distance between two local probability distributions. Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.Comment: Accepted to ACL 2020, camera-ready. 14 page

arXiv.org e-Print Archive

CalibreNet: Calibration Networks for Multilingual Sequence Labeling

Author: Gong Ming
Jiang Daxin
Liang Shining
Pei Jian
Shou Linjun
Zuo Wanli
Publication venue
Publication date: 11/11/2020
Field of study

Lack of training data in low-resource languages presents huge challenges to sequence labeling tasks such as named entity recognition (NER) and machine reading comprehension (MRC). One major obstacle is the errors on the boundary of predicted answers. To tackle this problem, we propose CalibreNet, which predicts answers in two steps. In the first step, any existing sequence labeling method can be adopted as a base model to generate an initial answer. In the second step, CalibreNet refines the boundary of the initial answer. To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection capability of CalibreNet. Experiments on two cross-lingual benchmark datasets show that the proposed approach achieves SOTA results on zero-shot cross-lingual NER and MRC tasks.Comment: Long paper in WSDM 202

arXiv.org e-Print Archive

Automated Concatenation of Embeddings for Structured Prediction

Author: Bach Nguyen
Huang Fei
Huang Zhongqiang
Jiang Yong
Tu Kewei
Wang Tao
Wang Xinyu
Publication venue
Publication date: 01/06/2021
Field of study

Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-increasing number of embedding types makes it a more difficult problem. In this paper, we propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks, based on a formulation inspired by recent progress on neural architecture search. Specifically, a controller alternately samples a concatenation of embeddings, according to its current belief of the effectiveness of individual embedding types in consideration for a task, and updates the belief based on a reward. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model, which is fed with the sampled concatenation as input and trained on a task dataset. Empirical results on 6 tasks and 21 datasets show that our approach outperforms strong baselines and achieves state-of-the-art performance with fine-tuned embeddings in all the evaluations.Comment: Accepted to Proceedings of ACL-IJCNLP 2021. 17 page

arXiv.org e-Print Archive