The abundance of literature related to the widespread COVID-19 pandemic is
beyond manual inspection of a single expert. Development of systems, capable of
automatically processing tens of thousands of scientific publications with the
aim to enrich existing empirical evidence with literature-based associations is
challenging and relevant. We propose a system for contextualization of
empirical expression data by approximating relations between entities, for
which representations were learned from one of the largest COVID-19-related
literature corpora. In order to exploit a larger scientific context by transfer
learning, we propose a novel embedding generation technique that leverages
SciBERT language model pretrained on a large multi-domain corpus of scientific
publications and fine-tuned for domain adaptation on the CORD-19 dataset. The
conducted manual evaluation by the medical expert and the quantitative
evaluation based on therapy targets identified in the related work suggest that
the proposed method can be successfully employed for COVID-19 therapy target
discovery and that it outperforms the baseline FastText method by a large
margin.Comment: Accepted to the 23rd International Conference on Discovery Science
(DS 2020