2,357 research outputs found
Injecting Knowledge into Biomedical Pre-trained Models via Polymorphism and Synonymous Substitution
Pre-trained language models (PLMs) were considered to be able to store
relational knowledge present in the training data. However, some relational
knowledge seems to be discarded unsafely in PLMs due to \textbf{report bias}:
low-frequency relational knowledge might be underexpressed compared to
high-frequency one in PLMs. This gives us a hint that relational knowledge
might not be redundant to the stored knowledge of PLMs, but rather be
complementary. To additionally inject relational knowledge into PLMs, we
propose a simple-yet-effective approach to inject relational knowledge into
PLMs, which is inspired by three observations (namely, polymorphism, synonymous
substitution, and association). In particular, we switch entities in the
training corpus to related entities (either hypernyms/hyponyms/synonyms, or
arbitrarily-related concepts). Experimental results show that the proposed
approach could not only better capture relational knowledge, but also improve
the performance in various biomedical downstream tasks. Our model is available
in \url{https://github.com/StevenZHB/BioPLM_InjectingKnowledge}
Recommended from our members
Injecting Inductive Biases into Distributed Representations of Text
Distributed real-valued vector representations of text (a.k.a. embeddings), learned by neural networks, encode various (linguistic) knowledge. To encode this knowledge into the embeddings the common approach is to train a large neural network on large corpora. There is, however, a growing concern regarding the sustainability and rationality of pursuing this approach further. We depart from the mainstream trend and instead, to incorporate the desired properties into embeddings, use inductive biases.
First, we use Knowledge Graphs (KGs) as a data-based inductive bias to derive the semantic representation of words and sentences. The explicit semantics that is encoded in a structure of a KG allows us to acquire the semantic representations without the need of employing a large amount of text. We use graph embedding techniques to learn the semantic representation of words and the sequence-to-sequence model to learn the semantic representation of sentences. We demonstrate the efficacy of the inductive bias for learning embeddings for rare words and the ability of sentence embeddings to encode topological dependencies that exist between entities of a KG.
Then, we explore the amount of information and sparsity as two key (data-agnostic) inductive biases to regulate the utilisation of the representation space. We impose these properties with Variational Autoencoders (VAEs). First, we regulate the amount of information encoded in a sentence embedding via constraint optimisation of a VAE objective function. We show that increasing amount of information allows to better discriminate sentences. Afterwards, to impose distributed sparsity we design a state-of-the-art Hierarchical Sparse VAE with a flexible posterior which captures the statistical characteristics of text effectively. While sparsity, in general, has desired computational and statistical representational properties, it is known to compensate task performance. We illustrate that with distributed sparsity, task performance could be maintained or even improved.
The findings of the thesis advocate further development of inductive biases that could mitigate the dependence of representation learning quality on large data and model sizes
- …