5 research outputs found
The Inception Team at NSURL-2019 Task 8: Semantic Question Similarity in Arabic
This paper describes our method for the task of Semantic Question Similarity
in Arabic in the workshop on NLP Solutions for Under-Resourced Languages
(NSURL). The aim is to build a model that is able to detect similar semantic
questions in the Arabic language for the provided dataset. Different methods of
determining questions similarity are explored in this work. The proposed models
achieved high F1-scores, which range from (88% to 96%). Our official best
result is produced from the ensemble model of using a pre-trained multilingual
BERT model with different random seeds with 95.924% F1-Score, which ranks the
first among nine participants teams.Comment: 6 pages, 2 figures, 5 table
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting
Deep pretrained language models have achieved great success in the way of
pretraining first and then fine-tuning. But such a sequential transfer learning
paradigm often confronts the catastrophic forgetting problem and leads to
sub-optimal performance. To fine-tune with less forgetting, we propose a recall
and learn mechanism, which adopts the idea of multi-task learning and jointly
learns pretraining tasks and downstream tasks. Specifically, we propose a
Pretraining Simulation mechanism to recall the knowledge from pretraining tasks
without data, and an Objective Shifting mechanism to focus the learning on
downstream tasks gradually. Experiments show that our method achieves
state-of-the-art performance on the GLUE benchmark. Our method also enables
BERT-base to achieve better performance than directly fine-tuning of
BERT-large. Further, we provide the open-source RecAdam optimizer, which
integrates the proposed mechanisms into Adam optimizer, to facility the NLP
community
On Identifiability in Transformers
In this paper we delve deep in the Transformer architecture by investigating
two of its core components: self-attention and contextual embeddings. In
particular, we study the identifiability of attention weights and token
embeddings, and the aggregation of context into hidden tokens. We show that,
for sequences longer than the attention head dimension, attention weights are
not identifiable. We propose effective attention as a complementary tool for
improving explanatory interpretations based on attention. Furthermore, we show
that input tokens retain to a large degree their identity across the model. We
also find evidence suggesting that identity information is mainly encoded in
the angle of the embeddings and gradually decreases with depth. Finally, we
demonstrate strong mixing of input information in the generation of contextual
embeddings by means of a novel quantification method based on gradient
attribution. Overall, we show that self-attention distributions are not
directly interpretable and present tools to better understand and further
investigate Transformer models.Comment: Published as a conference paper at ICLR 202
On the Robustness of Language Encoders against Grammatical Errors
We conduct a thorough study to diagnose the behaviors of pre-trained language
encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical
errors. Specifically, we collect real grammatical errors from non-native
speakers and conduct adversarial attacks to simulate these errors on clean text
data. We use this approach to facilitate debugging models on downstream
applications. Results confirm that the performance of all tested models is
affected but the degree of impact varies. To interpret model behaviors, we
further design a linguistic acceptability task to reveal their abilities in
identifying ungrammatical sentences and the position of errors. We find that
fixed contextual encoders with a simple classifier trained on the prediction of
sentence correctness are able to locate error positions. We also design a cloze
test for BERT and discover that BERT captures the interaction between errors
and specific tokens in context. Our results shed light on understanding the
robustness and behaviors of language encoders against grammatical errors.Comment: ACL 202
Coreferential Reasoning Learning for Language Representation
Language representation models such as BERT could effectively capture
contextual semantic information from plain text, and have been proved to
achieve promising results in lots of downstream NLP tasks with appropriate
fine-tuning. However, most existing language representation models cannot
explicitly handle coreference, which is essential to the coherent understanding
of the whole discourse. To address this issue, we present CorefBERT, a novel
language representation model that can capture the coreferential relations in
context. The experimental results show that, compared with existing baseline
models, CorefBERT can achieve significant improvements consistently on various
downstream NLP tasks that require coreferential reasoning, while maintaining
comparable performance to previous models on other common NLP tasks. The source
code and experiment details of this paper can be obtained from
https://github.com/thunlp/CorefBERT.Comment: Accepted by EMNLP202