Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the
performance of various downstream NLP tasks by injecting knowledge facts from
large-scale Knowledge Graphs (KGs). However, existing methods for pre-training
KEPLMs with relational triples are difficult to be adapted to close domains due
to the lack of sufficient domain graph semantics. In this paper, we propose a
Knowledge-enhanced lANGuAge Representation learning framework for various
clOsed dOmains (KANGAROO) via capturing the implicit graph structure among the
entities. Specifically, since the entity coverage rates of closed-domain KGs
can be relatively low and may exhibit the global sparsity phenomenon for
knowledge injection, we consider not only the shallow relational
representations of triples but also the hyperbolic embeddings of deep
hierarchical entity-class structures for effective knowledge fusion.Moreover,
as two closed-domain entities under the same entity-class often have locally
dense neighbor subgraphs counted by max point biconnected component, we further
propose a data augmentation strategy based on contrastive learning over
subgraphs to construct hard negative samples of higher quality. It makes the
underlying KELPMs better distinguish the semantics of these neighboring
entities to further complement the global semantic sparsity. In the
experiments, we evaluate KANGAROO over various knowledge-aware and general NLP
tasks in both full and few-shot learning settings, outperforming various KEPLM
training paradigms performance in closed-domains significantly.Comment: emnlp 202