5 research outputs found

    Why PairDiff works? -- A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection

    Get PDF
    Representing the semantic relations that exist between two given words (or entities) is an important first step in a wide-range of NLP applications such as analogical reasoning, knowledge base completion and relational information retrieval. A simple, yet surprisingly accurate method for representing a relation between two words is to compute the vector offset (\PairDiff) between their corresponding word embeddings. Despite the empirical success, it remains unclear as to whether \PairDiff is the best operator for obtaining a relational representation from word embeddings. We conduct a theoretical analysis of generalised bilinear operators that can be used to measure the ℓ2\ell_{2} relational distance between two word-pairs. We show that, if the word embeddings are standardised and uncorrelated, such an operator will be independent of bilinear terms, and can be simplified to a linear form, where \PairDiff is a special case. For numerous word embedding types, we empirically verify the uncorrelation assumption, demonstrating the general applicability of our theoretical result. Moreover, we experimentally discover \PairDiff from the bilinear relation composition operator on several benchmark analogy datasets

    A Study on Learning Representations for Relations Between Words

    Get PDF
    Reasoning about relations between words or entities plays an important role in human cognition. It is thus essential for a computational system which processes human languages to be able to understand the semantics of relations to simulate human intelligence. Automatic relation learning provides valuable information for many natural language processing tasks including ontology creation, question answering and machine translation, to name a few. This need brings us to the topic of this thesis where the main goal is to explore multiple resources and methodologies to effectively represent relations between words. How to effectively represent semantic relations between words remains a problem that is underexplored. A line of research makes use of relational patterns, which are the linguistic contexts in which two words co-occur in a corpus to infer a relation between them (e.g., X leads to Y). This approach suffers from data sparseness because not every related word-pair co-occurs even in a large corpus. In contrast, prior work on learning word embeddings have found that certain relations between words could be captured by applying linear arithmetic operators on the corresponding pre-trained word embeddings. Specifically, it has been shown that the vector offset (expressed as PairDiff) from one word to the other in a pair encodes the relation that holds between them, if any. Such a compositional method addresses the data sparseness by inferring a relation from constituent words in a word-pair and obviates the need of relational patterns. This thesis investigates the best way to compose word embeddings to represent relational instances. A systematic comparison is carried out for unsupervised operators, which in general reveals the superiority of the PairDiff operator on multiple word embedding models and benchmark datasets. Despite the empirical success, no theoretical analysis has been conducted so far explaining why and under what conditions PairDiff is optimal. To this end, a theoretical analysis is conducted for the generalised bilinear operators that can be used to measure the relational distance between two word-pairs. The main conclusion is that, under certain assumptions, the bilinear operator can be simplified to a linear form, where the widely used PairDiff operator is a special case. Multiple recent works raised concerns about existing unsupervised operators for inferring relations from pre-trained word embeddings. Thus, the question of whether it is possible to learn better parametrised relational compositional operators is addressed in this thesis. A supervised relation representation operator is proposed using a non-linear neural network that performs relation prediction. The evaluation on two benchmark datasets reveals that the penultimate layer of the trained neural network-based relational predictor acts as a good representation for the relations between words. Because we believe that both relational patterns and word embeddings provide complementary information to learn relations, a self-supervised context-guided relation embedding method that is trained on the two sources of information has been proposed. Experimentally, incorporating relational contexts shows improvement in the performance of a compositional operator for representing unseen word-pairs. Besides unstructured text corpora, knowledge graphs provide another source for relational facts in the form of nodes (i.e., entities) connected by edges (i.e., relations). Knowledge graphs are employed widely in natural language processing applications such as question answering and dialogue systems. Embedding entities and relations in a graph have shown impressive results for inferring previously unseen relations between entities. This thesis contributes to developing a theoretical model to infer a relationship between the connections in the graph and the embeddings of entities and relations. Learning graph embeddings that satisfy the proven theorem demonstrates efficient performance compared to existing heuristically derived graph embedding methods. As graph embedding methods generate representations for only existing relation types, a relation composition task is proposed in the thesis to tackle this limitation

    Context-Guided Self-supervised Relation Embeddings

    Get PDF
    A semantic relation between two given words a and b can be represented using two complementary sources of information: (a) the semantic representations of a and b (expressed as word embeddings) and, (b) the contextual information obtained from the co-occurrence contexts of the two words (expressed in the form of lexico-syntactic patterns). Pattern-based approach suffers from sparsity while methods rely only on word embeddings for the related pairs lack of relational information. Prior works on relation embeddings have pre-dominantly focused on either one type of those two resources exclusively, except for a notable few exceptions. In this paper, we proposed a self-supervised context-guided Relation Embedding method (CGRE) using the two sources of information. We evaluate the learnt method to create relation representations for word-pairs that do not co-occur. Experimental results on SemEval-2012 task2 dataset show that the proposed operator outperforms other methods in representing relations for unobserved word-pairs

    Towards a theoretical understanding of word and relation representation

    Get PDF
    Representing words by vectors of numbers, known as word embeddings, enables computational reasoning over words and is foundational to automating tasks involving natural language. For example, by crafting word embeddings so that similar words have similar valued embeddings, often thought of as nearby points in a semantic space, word similarity can be readily assessed using a variety of metrics. In contrast, judging whether two words are similar from more common representations, such as their English spelling, is often impossible (e.g. cat/feline); and to predetermine and store all similarities between all words is prohibitively time-consuming, memory intensive and subjective. As a succinct means of representing words – or, perhaps, the concepts that words themselves represent – word embeddings also relate to information theory and cognitive science. Numerous algorithms have been proposed to learn word embeddings from different data sources, such as large text corpora, document collections and “knowledge graphs” – compilations of facts in the form hsubject entity, relation, object entityi, e.g. hEdinburgh, capital of, Scotlandi. The broad aim of these algorithms is to capture information from the data in the components of each word embedding that is useful for a certain task or suite of tasks, such as detecting sentiment in text, identifying the topic of a document, or predicting whether a given fact is true or false. In this thesis, we focus on word embeddings learned from text corpora and knowledge graphs. Several well-known algorithms learn word embeddings from text on an unsupervised (or, more recently, self-supervised) basis by learning to predict context words that occur around each word, e.g. word2vec (Mikolov et al., 2013a,b) and GloVe (Pennington et al., 2014). The parameters of word embeddings learned in this way are known to reflect word co-occurrence statistics, but how they capture semantic meaning has been largely unclear. Knowledge graph representation models learn representations both of entities, which include words, people, places, etc., and binary relations between them. Representations are typically learned by training the model to predict known true facts of the knowledge graph in a supervised manner. Despite steady improvements in the accuracy with which knowledge graph representation models are able to predict facts, both seen and unseen during training, little is understood of the latent structure that allows them to do so. This limited understanding of how latent semantic structure is encoded in the geometry of word embeddings and knowledge graph representations makes a principled direction for improving their performance, reliability or interpretability unclear. To address this: 1. we theoretically justify the empirical observation that particular geometric relationships between word embeddings learned by algorithms such as word2vec and GloVe correspond to semantic relations between words; and 2. we extend this correspondence between semantics and geometry to the entities and relations of knowledge graphs, providing a model for the latent structure of knowledge graph representation linked to that of word embeddings. We first give a probabilistic explanation for why word embeddings of analogies – phrases of the form “man is to king as woman is to queen” – often appear to approximate a parallelogram. This “analogy phenomenon” has generated much intrigue since word embeddings are not trained to achieve it, yet it allows many analogies to be “solved” simply by adding and subtracting their embeddings, e.g. wqueen ≈ wking − wman + wwoman. Similar probabilistic rationale is given to explain how semantic relations such as similarity and paraphrase are encoded in the relative geometry of word embeddings. Lastly, we extend this correspondence, between semantics and embedding geometry, to the specific relations of knowledge graphs. We derive a hierarchical categorisation of relation types and, for each type, identify the notional geometric relationship between word embeddings of related entities. This gives a theoretical basis for relation representation against which we can contrast a range of knowledge graph representation models. By analysing properties of their representations and their relation-by-relation performance, we show that the closer the agreement between how a model represents a relation and our theoretically-inspired basis, the better the model performs. Indeed, a knowledge graph representation model inspired by this research achieved state-of-the-art performance (Balaˇzevi´c et al., 2019b)
    corecore