Biomedical entity normalization unifies the language across biomedical
experiments and studies, and further enables us to obtain a holistic view of
life sciences. Current approaches mainly study the normalization of more
standardized entities such as diseases and drugs, while disregarding the more
ambiguous but crucial entities such as pathways, functions and cell types,
hindering their real-world applications. To achieve biomedical entity
normalization on these under-explored entities, we first introduce an
expert-curated dataset OBO-syn encompassing 70 different types of entities and
2 million curated entity-synonym pairs. To utilize the unique graph structure
in this dataset, we propose GraphPrompt, a prompt-based learning approach that
creates prompt templates according to the graphs. GraphPrompt obtained 41.0%
and 29.9% improvement on zero-shot and few-shot settings respectively,
indicating the effectiveness of these graph-based prompt templates. We envision
that our method GraphPrompt and OBO-syn dataset can be broadly applied to
graph-based NLP tasks, and serve as the basis for analyzing diverse and
accumulating biomedical data.Comment: 12 page