Timely and accurate extraction of Adverse Drug Events (ADE) from biomedical
literature is paramount for public safety, but involves slow and costly manual
labor. We set out to improve drug safety monitoring (pharmacovigilance, PV)
through the use of Natural Language Processing (NLP). We introduce BioDEX, a
large-scale resource for Biomedical adverse Drug Event Extraction, rooted in
the historical output of drug safety reporting in the U.S. BioDEX consists of
65k abstracts and 19k full-text biomedical papers with 256k associated
document-level safety reports created by medical experts. The core features of
these reports include the reported weight, age, and biological sex of a
patient, a set of drugs taken by the patient, the drug dosages, the reactions
experienced, and whether the reaction was life threatening. In this work, we
consider the task of predicting the core information of the report given its
originating paper. We estimate human performance to be 72.0% F1, whereas our
best model achieves 62.3% F1, indicating significant headroom on this task. We
also begin to explore ways in which these models could help professional PV
reviewers. Our code and data are available: https://github.com/KarelDO/BioDEX.Comment: 28 page