1,129 research outputs found
BERT efficacy on scientific and medical datasets: a systematic literature review
Bidirectional Encoder Representations from Transformers (BERT) [Devlin et al., 2018] has been shown to be effective at modeling a multitude of datasets across a wide variety of Natural Language Processing (NLP) tasks; however, little research has been done regarding BERT’s effectiveness at modeling domain-specific datasets. Specifically, scientific and medical datasets present a particularly difficult challenge in NLP, as these types of corpora are often rife with technical jargon that is largely absent from the canonical corpora that BERT and other transfer learning models were originally trained on. This thesis is a Systematic Literature Review (SLR) of twenty-seven studies that were selected to address the various methods of implementation when applying BERT to scientific and medical datasets. These studies show that despite the datasets’ esoteric subject matter, BERT can be effective at a wide range of tasks when applied to domain-specific datasets. Furthermore, these studies show that the addition of domain-specific pretraining, either through additional pretraining or the utilization of domain-specific BERT derivatives such as BioBERT [Lee et al., 2019], can further augment BERT’s performance on scientific and medical texts
CBEAF-Adapting: Enhanced Continual Pretraining for Building Chinese Biomedical Language Model
Continual pretraining is a standard way of building a domain-specific
pretrained language model from a general-domain language model. However,
sequential task training may cause catastrophic forgetting, which affects the
model performance in downstream tasks. In this paper, we propose a continual
pretraining method for the BERT-based model, named CBEAF-Adapting (Chinese
Biomedical Enhanced Attention-FFN Adapting). Its main idea is to introduce a
small number of attention heads and hidden units inside each self-attention
layer and feed-forward network. Using the Chinese biomedical domain as a
running example, we trained a domain-specific language model named
CBEAF-RoBERTa. We conduct experiments by applying models to downstream tasks.
The results demonstrate that with only about 3% of model parameters trained,
our method could achieve about 0.5%, 2% average performance gain compared to
the best performing model in baseline and the domain-specific model,
PCL-MedBERT, respectively. We also examine the forgetting problem of different
pretraining methods. Our method alleviates the problem by about 13% compared to
fine-tuning
Generalizing through Forgetting -- Domain Generalization for Symptom Event Extraction in Clinical Notes
Symptom information is primarily documented in free-text clinical notes and
is not directly accessible for downstream applications. To address this
challenge, information extraction approaches that can handle clinical language
variation across different institutions and specialties are needed. In this
paper, we present domain generalization for symptom extraction using
pretraining and fine-tuning data that differs from the target domain in terms
of institution and/or specialty and patient population. We extract symptom
events using a transformer-based joint entity and relation extraction method.
To reduce reliance on domain-specific features, we propose a domain
generalization method that dynamically masks frequent symptoms words in the
source domain. Additionally, we pretrain the transformer language model (LM) on
task-related unlabeled texts for better representation. Our experiments
indicate that masking and adaptive pretraining methods can significantly
improve performance when the source domain is more distant from the target
domain
Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study
This paper presents an empirical study to build relation extraction systems
in low-resource settings. Based upon recent pre-trained language models, we
comprehensively investigate three schemes to evaluate the performance in
low-resource settings: (i) different types of prompt-based methods with
few-shot labeled data; (ii) diverse balancing methods to address the
long-tailed distribution issue; (iii) data augmentation technologies and
self-training to generate more labeled in-domain data. We create a benchmark
with 8 relation extraction (RE) datasets covering different languages, domains
and contexts and perform extensive comparisons over the proposed schemes with
combinations. Our experiments illustrate: (i) Though prompt-based tuning is
beneficial in low-resource RE, there is still much potential for improvement,
especially in extracting relations from cross-sentence contexts with multiple
relational triples; (ii) Balancing methods are not always helpful for RE with
long-tailed distribution; (iii) Data augmentation complements existing
baselines and can bring much performance gain, while self-training may not
consistently achieve advancement to low-resource RE. Code and datasets are in
https://github.com/zjunlp/LREBench.Comment: Accepted to EMNLP 2022 (Findings) and the project website is
https://zjunlp.github.io/project/LREBench
Comparative Analysis of Contextual Relation Extraction based on Deep Learning Models
Contextual Relation Extraction (CRE) is mainly used for constructing a
knowledge graph with a help of ontology. It performs various tasks such as
semantic search, query answering, and textual entailment. Relation extraction
identifies the entities from raw texts and the relations among them. An
efficient and accurate CRE system is essential for creating domain knowledge in
the biomedical industry. Existing Machine Learning and Natural Language
Processing (NLP) techniques are not suitable to predict complex relations from
sentences that consist of more than two relations and unspecified entities
efficiently. In this work, deep learning techniques have been used to identify
the appropriate semantic relation based on the context from multiple sentences.
Even though various machine learning models have been used for relation
extraction, they provide better results only for binary relations, i.e.,
relations occurred exactly between the two entities in a sentence. Machine
learning models are not suited for complex sentences that consist of the words
that have various meanings. To address these issues, hybrid deep learning
models have been used to extract the relations from complex sentence
effectively. This paper explores the analysis of various deep learning models
that are used for relation extraction.Comment: This Paper Presented in the International Conference on FOSS
Approaches towards Computational Intelligence and Language TTechnolog on
February 2023, Thiruvananthapura
- …