1 research outputs found
LitMC-BERT: transformer-based multi-label classification of biomedical literature with an application on COVID-19 literature curation
The rapid growth of biomedical literature poses a significant challenge for
curation and interpretation. This has become more evident during the COVID-19
pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed,
has accumulated over 180,000 articles with millions of accesses. Approximately
10,000 new articles are added to LitCovid every month. A main curation task in
LitCovid is topic annotation where an article is assigned with up to eight
topics, e.g., Treatment and Diagnosis. The annotated topics have been widely
used both in LitCovid (e.g., accounting for ~18% of total uses) and downstream
studies such as network generation. However, it has been a primary curation
bottleneck due to the nature of the task and the rapid literature growth. This
study proposes LITMC-BERT, a transformer-based multi-label classification
method in biomedical literature. It uses a shared transformer backbone for all
the labels while also captures label-specific features and the correlations
between label pairs. We compare LITMC-BERT with three baseline models on two
datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the
current best results, respectively, and only requires ~18% of the inference
time than the Binary BERT baseline. The related datasets and models are
available via https://github.com/ncbi/ml-transformer