3,887 research outputs found
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Biomedical text mining is becoming increasingly important as the number of
biomedical documents rapidly grows. With the progress in natural language
processing (NLP), extracting valuable information from biomedical literature
has gained popularity among researchers, and deep learning has boosted the
development of effective biomedical text mining models. However, directly
applying the advancements in NLP to biomedical text mining often yields
unsatisfactory results due to a word distribution shift from general domain
corpora to biomedical corpora. In this article, we investigate how the recently
introduced pre-trained language model BERT can be adapted for biomedical
corpora. We introduce BioBERT (Bidirectional Encoder Representations from
Transformers for Biomedical Text Mining), which is a domain-specific language
representation model pre-trained on large-scale biomedical corpora. With almost
the same architecture across tasks, BioBERT largely outperforms BERT and
previous state-of-the-art models in a variety of biomedical text mining tasks
when pre-trained on biomedical corpora. While BERT obtains performance
comparable to that of previous state-of-the-art models, BioBERT significantly
outperforms them on the following three representative biomedical text mining
tasks: biomedical named entity recognition (0.62% F1 score improvement),
biomedical relation extraction (2.80% F1 score improvement) and biomedical
question answering (12.24% MRR improvement). Our analysis results show that
pre-training BERT on biomedical corpora helps it to understand complex
biomedical texts. We make the pre-trained weights of BioBERT freely available
at https://github.com/naver/biobert-pretrained, and the source code for
fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.Comment: Bioinformatic
Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models
Many business documents processed in modern NLP and IR pipelines are visually
rich: in addition to text, their semantics can also be captured by visual
traits such as layout, format, and fonts. We study the problem of information
extraction from visually rich documents (VRDs) and present a model that
combines the power of large pre-trained language models and graph neural
networks to efficiently encode both textual and visual information in business
documents. We further introduce new fine-tuning objectives to improve in-domain
unsupervised fine-tuning to better utilize large amount of unlabeled in-domain
data. We experiment on real world invoice and resume data sets and show that
the proposed method outperforms strong text-based RoBERTa baselines by 6.3%
absolute F1 on invoices and 4.7% absolute F1 on resumes. When evaluated in a
few-shot setting, our method requires up to 30x less annotation data than the
baseline to achieve the same level of performance at ~90% F1.Comment: 10 pages, to appear in SIGIR 2020 Industry Trac
ERNIE: Enhanced Language Representation with Informative Entities
Neural language representation models such as BERT pre-trained on large-scale
corpora can well capture rich semantic patterns from plain text, and be
fine-tuned to consistently improve the performance of various NLP tasks.
However, the existing pre-trained language models rarely consider incorporating
knowledge graphs (KGs), which can provide rich structured knowledge facts for
better language understanding. We argue that informative entities in KGs can
enhance language representation with external knowledge. In this paper, we
utilize both large-scale textual corpora and KGs to train an enhanced language
representation model (ERNIE), which can take full advantage of lexical,
syntactic, and knowledge information simultaneously. The experimental results
have demonstrated that ERNIE achieves significant improvements on various
knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art
model BERT on other common NLP tasks. The source code of this paper can be
obtained from https://github.com/thunlp/ERNIE.Comment: Accepted by ACL 201
Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model
Computational chemistry develops fast in recent years due to the rapid growth
and breakthroughs in AI. Thanks for the progress in natural language
processing, researchers can extract more fine-grained knowledge in publications
to stimulate the development in computational chemistry. While the works and
corpora in chemical entity extraction have been restricted in the biomedicine
or life science field instead of the chemistry field, we build a new corpus in
chemical bond field annotated for 7 types of entities: compound, solvent,
method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF
model to build scientific chemical data chains by extracting 7 chemical
entities and relations from publications. And we propose a joint model to
extract the entities and relations simultaneously. Experimental results on our
Chemical Special Corpus demonstrate that we achieve state-of-art and
competitive NER performance
Exploring Contextualized Neural Language Models for Temporal Dependency Parsing
Extracting temporal relations between events and time expressions has many
applications such as constructing event timelines and time-related question
answering. It is a challenging problem which requires syntactic and semantic
information at sentence or discourse levels, which may be captured by deep
contextualized language models (LMs) such as BERT (Devlin et al., 2019). In
this paper, we develop several variants of BERT-based temporal dependency
parser, and show that BERT significantly improves temporal dependency parsing
(Zhang and Xue, 2018a). We also present a detailed analysis on why deep
contextualized neural LMs help and where they may fall short. Source code and
resources are made available at https://github.com/bnmin/tdp_ranking
An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining
Multi-task learning (MTL) has achieved remarkable success in natural language
processing applications. In this work, we study a multi-task learning model
with multiple decoders on varieties of biomedical and clinical natural language
processing tasks such as text similarity, relation extraction, named entity
recognition, and text inference. Our empirical results demonstrate that the MTL
fine-tuned models outperform state-of-the-art transformer models (e.g., BERT
and its variants) by 2.0% and 1.3% in biomedical and clinical domains,
respectively. Pairwise MTL further demonstrates more details about which tasks
can improve or decrease others. This is particularly helpful in the context
that researchers are in the hassle of choosing a suitable model for new
problems. The code and models are publicly available at
https://github.com/ncbi-nlp/bluebertComment: Accepted by BioNLP 202
DisSent: Sentence Representation Learning from Explicit Discourse Relations
Learning effective representations of sentences is one of the core missions
of natural language understanding. Existing models either train on a vast
amount of text, or require costly, manually curated sentence relation datasets.
We show that with dependency parsing and rule-based rubrics, we can curate a
high quality sentence relation task by leveraging explicit discourse relations.
We show that our curated dataset provides an excellent signal for learning
vector representations of sentence meaning, representing relations that can
only be determined when the meanings of two sentences are combined. We
demonstrate that the automatically curated corpus allows a bidirectional LSTM
sentence encoder to yield high quality sentence embeddings and can serve as a
supervised fine-tuning dataset for larger models such as BERT. Our fixed
sentence embeddings achieve high performance on a variety of transfer tasks,
including SentEval, and we achieve state-of-the-art results on Penn Discourse
Treebank's implicit relation prediction task.Comment: 13 pages, 4 figures. ACL 201
Text-based Question Answering from Information Retrieval and Deep Neural Network Perspectives: A Survey
Text-based Question Answering (QA) is a challenging task which aims at
finding short concrete answers for users' questions. This line of research has
been widely studied with information retrieval techniques and has received
increasing attention in recent years by considering deep neural network
approaches. Deep learning approaches, which are the main focus of this paper,
provide a powerful technique to learn multiple layers of representations and
interaction between questions and texts. In this paper, we provide a
comprehensive overview of different models proposed for the QA task, including
both traditional information retrieval perspective, and more recent deep neural
network perspective. We also introduce well-known datasets for the task and
present available results from the literature to have a comparison between
different techniques
QuASE: Question-Answer Driven Sentence Encoding
Question-answering (QA) data often encodes essential information in many
facets. This paper studies a natural question: Can we get supervision from QA
data for other tasks (typically, non-QA ones)? For example, {\em can we use
QAMR (Michael et al., 2017) to improve named entity recognition?} We suggest
that simply further pre-training BERT is often not the best option, and propose
the {\em question-answer driven sentence encoding (QuASE)} framework. QuASE
learns representations from QA data, using BERT or other state-of-the-art
contextual language models. In particular, we observe the need to distinguish
between two types of sentence encodings, depending on whether the target task
is a single- or multi-sentence input; in both cases, the resulting encoding is
shown to be an easy-to-use plugin for many downstream tasks. This work may
point out an alternative way to supervise NLP tasks
Enriching Pre-trained Language Model with Entity Information for Relation Classification
Relation classification is an important NLP task to extract relations between
entities. The state-of-the-art methods for relation classification are
primarily based on Convolutional or Recurrent Neural Networks. Recently, the
pre-trained BERT model achieves very successful results in many NLP
classification / sequence labeling tasks. Relation classification differs from
those tasks in that it relies on information of both the sentence and the two
target entities. In this paper, we propose a model that both leverages the
pre-trained BERT language model and incorporates information from the target
entities to tackle the relation classification task. We locate the target
entities and transfer the information through the pre-trained architecture and
incorporate the corresponding encoding of the two entities. We achieve
significant improvement over the state-of-the-art method on the SemEval-2010
task 8 relational dataset.Comment: 6 page
- …