Entity linking faces significant challenges such as prolific variations and
prevalent ambiguities, especially in high-value domains with myriad entities.
Standard classification approaches suffer from the annotation bottleneck and
cannot effectively handle unseen entities. Zero-shot entity linking has emerged
as a promising direction for generalizing to new entities, but it still
requires example gold entity mentions during training and canonical
descriptions for all entities, both of which are rarely available outside of
Wikipedia. In this paper, we explore Knowledge-RIch Self-Supervision (KRISS) for biomedical entity linking, by leveraging readily available domain
knowledge. In training, it generates self-supervised mention examples on
unlabeled text using a domain ontology and trains a contextual encoder using
contrastive learning. For inference, it samples self-supervised mentions as
prototypes for each entity and conducts linking by mapping the test mention to
the most similar prototype. Our approach can easily incorporate entity
descriptions and gold mention labels if available. We conducted extensive
experiments on seven standard datasets spanning biomedical literature and
clinical notes. Without using any labeled information, our method produces KRISSBERT, a universal entity linker for four million UMLS entities that
attains new state of the art, outperforming prior self-supervised methods by as
much as 20 absolute points in accuracy