Search CORE

10 research outputs found

Morphological Priors for Probabilistic Neural Word Embeddings

Author: Bhatia Parminder
Eisenstein Jacob
Guthrie Robert
Publication venue
Publication date: 01/01/2016
Field of study

Word embeddings allow natural language processing systems to share statistical information across related words. These embeddings are typically based on distributional statistics, making it difficult for them to generalize to rare or unseen words. We propose to improve word embeddings by incorporating morphological information, capturing shared sub-word features. Unlike previous work that constructs word embeddings directly from morphemes, we combine morphological and distributional information in a unified probabilistic framework, in which the word embedding is a latent variable. The morphological information provides a prior distribution on the latent word embeddings, which in turn condition a likelihood function over an observed corpus. This approach yields improvements on intrinsic word similarity evaluations, and also in the downstream task of part-of-speech tagging.Comment: Appeared at the Conference on Empirical Methods in Natural Language Processing (EMNLP 2016, Austin

arXiv.org e-Print Archive

Crossref

Lifted rule injection for relation embeddings

Author: Demeester Thomas
Riedel S.
Rocktäschel T.
Publication venue
Publication date: 01/01/2016
Field of study

Methods based on representation learning currently hold the state-of-the-art in many natural language processing and knowledge base inference tasks. Yet, a major challenge is how to efficiently incorporate commonsense knowledge into such models. A recent approach regularizes relation and entity representations by propositionalization of first-order logic rules. However, propositionalization does not scale beyond domains with only few entities and rules. In this paper we present a highly efficient method for incorporating implication rules into distributed representations for automated knowledge base construction. We map entity-tuple embeddings into an approximately Boolean space and encourage a partial ordering over relation embeddings based on implication rules mined from WordNet. Surprisingly, we find that the strong restriction of the entity-tuple embedding space does not hurt the expressiveness of the model and even acts as a regularizer that improves generalization. By incorporating few commonsense rules, we achieve an increase of 2 percentage points mean average precision over a matrix factorization baseline, while observing a negligible increase in runtime

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

UCL Discovery

Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection

Author: Chang Haw-Shiuan
McCallum Andrew
Vilnis Luke
Wang ZiYun
Publication venue
Publication date: 01/01/2018
Field of study

Modeling hypernymy, such as poodle is-a dog, is an important generalization aid to many NLP tasks, such as entailment, coreference, relation extraction, and question answering. Supervised learning from labeled hypernym sources, such as WordNet, limits the coverage of these models, which can be addressed by learning hypernyms from unlabeled text. Existing unsupervised methods either do not scale to large vocabularies or yield unacceptably poor accuracy. This paper introduces distributional inclusion vector embedding (DIVE), a simple-to-implement unsupervised method of hypernym discovery via per-word non-negative vector embeddings which preserve the inclusion property of word contexts in a low-dimensional and interpretable space. In experimental evaluations more comprehensive than any previous literature of which we are aware-evaluating on 11 datasets using multiple existing as well as newly proposed scoring functions-we find that our method provides up to double the precision of previous unsupervised embeddings, and the highest average performance, using a much more compact word representation, and yielding many new state-of-the-art results.Comment: NAACL 201

arXiv.org e-Print Archive

Crossref

Hypernym Detection Using Strict Partial Order Networks

Author: Chowdhury Md Faisal Mahbub
Dash Sarthak
Fauceglia Nicolas Rodolfo
Gliozzo Alfio
Mihindukulasooriya Nandana
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 22/11/2019
Field of study

This paper introduces Strict Partial Order Networks (SPON), a novel neural network architecture designed to enforce asymmetry and transitive properties as soft constraints. We apply it to induce hypernymy relations by training with is-a pairs. We also present an augmented variant of SPON that can generalize type information learned for in-vocabulary terms to previously unseen ones. An extensive evaluation over eleven benchmarks across different tasks shows that SPON consistently either outperforms or attains the state of the art on all but one of these benchmarks.Comment: 8 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Ideal Words: A Vector-Based Formalisation of Semantic Competence

Author: Copestake A
Herbelot A
Publication venue: KI - Kunstliche Intelligenz
Publication date: 22/11/2021
Field of study

Funder: Università degli Studi di TrentoAbstractIn this theoretical paper, we consider the notion of semantic competence and its relation to general language understanding—one of the most sough-after goals of Artificial Intelligence. We come back to three main accounts of competence involving (a) lexical knowledge; (b) truth-theoretic reference; and (c) causal chains in language use. We argue that all three are needed to reach a notion of meaning in artificial agents and suggest that they can be combined in a single formalisation, where competence develops from exposure to observable performance data. We introduce a theoretical framework which translates set theory into vector-space semantics by applying distributional techniques to a corpus of utterances associated with truth values. The resulting meaning space naturally satisfies the requirements of a causal theory of competence, but it can also be regarded as some ‘ideal’ model of the world, allowing for extensions and standard lexical relations to be retrieved.</jats:p

Apollo (Cambridge)

Combining Representation Learning with Logic for Language Processing

Author: Rocktäschel Tim
Publication venue
Publication date: 27/12/2017
Field of study

The current state-of-the-art in many natural language processing and automated knowledge base completion tasks is held by representation learning methods which learn distributed vector representations of symbols via gradient-based optimization. They require little or no hand-crafted features, thus avoiding the need for most preprocessing steps and task-specific assumptions. However, in many cases representation learning requires a large amount of annotated training data to generalize well to unseen data. Such labeled training data is provided by human annotators who often use formal logic as the language for specifying annotations. This thesis investigates different combinations of representation learning methods with logic for reducing the need for annotated training data, and for improving generalization.Comment: PhD Thesis, University College London, Submitted and accepted in 201

arXiv.org e-Print Archive

UCL Discovery