1 research outputs found

    Leveraging structure for learning representations of words, sentences and knowledge bases

    Get PDF
    This thesis presents work on learning representations of text and Knowledge Bases by taking into consideration their respective structures. The tasks for which the methods are developed and evaluated on are: Short-text classification, Word Sense Induction and Disambiguation, Knowledge Base Completion with linked text corpora, and large-scale Knowledge Base Question Answering. An introductory chapter states the aims and scope of the thesis, followed by a chapter on technical background and definitions. In chapter 3, the impact of dependency syntax on word representation learning in the context of short-text classification is investigated. A new definition of context in dependency graphs is proposed, which generalizes and extends previous definitions used in word representation learning. The resulting word and dependency feature embeddings are used together to represent dependency graph substructures in text classifiers. In chapter 4, a probabilistic latent variable model for Word Sense Induction and Disambiguation is presented. The model estimates sense clusters using pretrained continuous feature vectors of multiple context types: syntactic, local lexical and global lexical, while the number of sense clusters is determined by the Integrated Complete Likelihood criterion. A model for Knowledge Base Completion with linked text corpora is presented in chapter 5. The proposed model represents potential facts by merging subgraphs of the knowledge base with text through linked entities. The model learns to embed the merged graphs into a lower dimensional space and score the plausibility of the fact with a Multilayer Perceptron. Chapter 6 presents a system for Question Answering on Knowledge Bases. The system learns to decompose questions into entity and relation mentions and score their compatibility with queries on the knowledge base expressed as subgraphs. The model consists of several components trained jointly in order to match parts of the question with parts of a potential query by embedding their corresponding structures in lower dimensional spaces
    corecore