84 research outputs found
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Following the major success of neural language models (LMs) such as BERT or
GPT-2 on a variety of language understanding tasks, recent work focused on
injecting (structured) knowledge from external resources into these models.
While on the one hand, joint pretraining (i.e., training from scratch, adding
objectives based on external knowledge to the primary LM objective) may be
prohibitively computationally expensive, post-hoc fine-tuning on external
knowledge, on the other hand, may lead to the catastrophic forgetting of
distributional knowledge. In this work, we investigate models for complementing
the distributional knowledge of BERT with conceptual knowledge from ConceptNet
and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using
adapter training. While overall results on the GLUE benchmark paint an
inconclusive picture, a deeper analysis reveals that our adapter-based models
substantially outperform BERT (up to 15-20 performance points) on inference
tasks that require the type of conceptual knowledge explicitly present in
ConceptNet and OMCS
Conceptualized Representation Learning for Chinese Biomedical Text Mining
Biomedical text mining is becoming increasingly important as the number of
biomedical documents and web data rapidly grows. Recently, word representation
models such as BERT has gained popularity among researchers. However, it is
difficult to estimate their performance on datasets containing biomedical texts
as the word distributions of general and biomedical corpora are quite
different. Moreover, the medical domain has long-tail concepts and
terminologies that are difficult to be learned via language models. For the
Chinese biomedical text, it is more difficult due to its complex structure and
the variety of phrase combinations. In this paper, we investigate how the
recently introduced pre-trained language model BERT can be adapted for Chinese
biomedical corpora and propose a novel conceptualized representation learning
approach. We also release a new Chinese Biomedical Language Understanding
Evaluation benchmark (\textbf{ChineseBLUE}). We examine the effectiveness of
Chinese pre-trained models: BERT, BERT-wwm, RoBERTa, and our approach.
Experimental results on the benchmark show that our approach could bring
significant gain. We release the pre-trained model on GitHub:
https://github.com/alibaba-research/ChineseBLUE.Comment: WSDM2020 Health Da
KSAT: Knowledge-infused Self Attention Transformer -- Integrating Multiple Domain-Specific Contexts
Domain-specific language understanding requires integrating multiple pieces
of relevant contextual information. For example, we see both suicide and
depression-related behavior (multiple contexts) in the text ``I have a gun and
feel pretty bad about my life, and it wouldn't be the worst thing if I didn't
wake up tomorrow''. Domain specificity in self-attention architectures is
handled by fine-tuning on excerpts from relevant domain specific resources
(datasets and external knowledge - medical textbook chapters on mental health
diagnosis related to suicide and depression). We propose a modified
self-attention architecture Knowledge-infused Self Attention Transformer (KSAT)
that achieves the integration of multiple domain-specific contexts through the
use of external knowledge sources. KSAT introduces knowledge-guided biases in
dedicated self-attention layers for each knowledge source to accomplish this.
In addition, KSAT provides mechanics for controlling the trade-off between
learning from data and learning from knowledge. Our quantitative and
qualitative evaluations show that (1) the KSAT architecture provides novel
human-understandable ways to precisely measure and visualize the contributions
of the infused domain contexts, and (2) KSAT performs competitively with other
knowledge-infused baselines and significantly outperforms baselines that use
fine-tuning for domain-specific tasks
An Improved Baseline for Sentence-level Relation Extraction
Sentence-level relation extraction (RE) aims at identifying the relationship
between two entities in a sentence. Many efforts have been devoted to this
problem, while the best performing methods are still far from perfect. In this
paper, we revisit two problems that affect the performance of existing RE
models, namely entity representation and noisy or ill-defined labels. Our
improved baseline model, incorporated with entity representations with typed
markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous
SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1%
on the refined Re-TACRED dataset, demonstrating that the pre-trained language
models achieve unexpectedly high performance on this task. We release our code
to the community for future research.Comment: Code available at https://github.com/wzhouad/RE_improved_baselin
- …