33 research outputs found
Chasing Hypernyms in Vector Spaces with Entropy
In this paper, we introduce SLQS, a new entropy-based measure for the unsupervised identification of hypernymy and its directionality in Distributional Semantic Models (DSMs). SLQS is assessed through two tasks: (i.) identifying the hypernym in hyponym-hypernym pairs, and (ii.) discriminating hypernymy among various semantic relations. In both tasks, SLQS outperforms other state-of-the-art measures
L'antonymie observée avec des méthodes de TAL : une relation à la fois syntagmatique et paradigmatique ?
Article courtIn this paper, we use NLP methods to test the hypothesis, suggested by several linguistic studies, that antonymy is not only a paradigmatic but also a syntagmatic relation : antonym pairs, that have been classically described by their ability to be substituted for each other, also tend to frequently co-occur in texts. We use two methods – distributional analysis on the paradigmatic level, lexico-syntactic pattern recognition on the syntagmatic level. Results show that antonym detection is not significantly improved by combining the two methods : a set of antonyms do not satisfy the test for substitutability, which tends to confirm the predominance of the syntagmatic level for studying and identifying antonymy.Cette étude utilise des outils de TAL pour tester l'hypothèse avancée par plusieurs études linguistiques récentes selon laquelle la relation antonymique, classiquement décrite comme une relation paradigmatique, a la particularité de fonctionner également sur le plan syntagmatique, c'est-à-dire de réunir des mots qui sont non seulement substituables mais qui apparaissent également régulièrement dans des relations contextuelles. Nous utilisons deux méthodes – l'analyse distributionnelle pour le plan paradigmatique, la recherche par patrons antonymiques pour le plan syntagmatique. Les résultats montrent que le diagnostic d'antonymie n'est pas significativement meilleur lorsqu'on croise les deux méthodes, puisqu'une partie des antonymes identifiés ne répondent pas au test de substituabilité, ce qui semble confirmer la prépondérance du plan syntagmatique pour l'étude et l'acquisition de cette relation
SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)
This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dutch, Italian and French from domains as diverse as environment, food and science. A total of
62 runs submitted by 5 different teams were
evaluated using structural measures, by comparison with gold standard taxonomies and by manual quality assessment of novel relations.Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (INSIGHT
Distributional Sentence Entailment Using Density Matrices
Categorical compositional distributional model of Coecke et al. (2010)
suggests a way to combine grammatical composition of the formal, type logical
models with the corpus based, empirical word representations of distributional
semantics. This paper contributes to the project by expanding the model to also
capture entailment relations. This is achieved by extending the representations
of words from points in meaning space to density operators, which are
probability distributions on the subspaces of the space. A symmetric measure of
similarity and an asymmetric measure of entailment is defined, where lexical
entailment is measured using von Neumann entropy, the quantum variant of
Kullback-Leibler divergence. Lexical entailment, combined with the composition
map on word representations, provides a method to obtain entailment relations
on the level of sentences. Truth theoretic and corpus-based examples are
provided.Comment: 11 page
Don't Blame Distributional Semantics if it can't do Entailment
Distributional semantics has had enormous empirical success in Computational
Linguistics and Cognitive Science in modeling various semantic phenomena, such
as semantic similarity, and distributional models are widely used in
state-of-the-art Natural Language Processing systems. However, the theoretical
status of distributional semantics within a broader theory of language and
cognition is still unclear: What does distributional semantics model? Can it
be, on its own, a fully adequate model of the meanings of linguistic
expressions? The standard answer is that distributional semantics is not fully
adequate in this regard, because it falls short on some of the central aspects
of formal semantic approaches: truth conditions, entailment, reference, and
certain aspects of compositionality. We argue that this standard answer rests
on a misconception: These aspects do not belong in a theory of expression
meaning, they are instead aspects of speaker meaning, i.e., communicative
intentions in a particular context. In a slogan: words do not refer, speakers
do. Clearing this up enables us to argue that distributional semantics on its
own is an adequate model of expression meaning. Our proposal sheds light on the
role of distributional semantics in a broader theory of language and cognition,
its relationship to formal semantics, and its place in computational models.Comment: To appear in Proceedings of the 13th International Conference on
Computational Semantics (IWCS 2019), Gothenburg, Swede
Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
ROOT9 is a supervised system for the classification of hypernyms, co-hyponyms
and random words that is derived from the already introduced ROOT13 (Santus et
al., 2016). It relies on a Random Forest algorithm and nine unsupervised
corpus-based features. We evaluate it with a 10-fold cross validation on 9,600
pairs, equally distributed among the three classes and involving several
Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are
present, ROOT9 achieves an F1 score of 90.7%, against a baseline of 57.2%
(vector cosine). When the classification is binary, ROOT9 achieves the
following results against the baseline: hypernyms-co-hyponyms 95.7% vs. 69.8%,
hypernyms-random 91.8% vs. 64.1% and co-hyponyms-random 97.8% vs. 79.4%. In
order to compare the performance with the state-of-the-art, we have also
evaluated ROOT9 in subsets of the Weeds et al. (2014) datasets, proving that it
is in fact competitive. Finally, we investigated whether the system learns the
semantic relation or it simply learns the prototypical hypernyms, as claimed by
Levy et al. (2015). The second possibility seems to be the most likely, even
though ROOT9 can be trained on negative examples (i.e., switched hypernyms) to
drastically reduce this bias.Comment: in LREC 201
A Generalised Quantifier Theory of Natural Language in Categorical Compositional Distributional Semantics with Bialgebras
Categorical compositional distributional semantics is a model of natural
language; it combines the statistical vector space models of words with the
compositional models of grammar. We formalise in this model the generalised
quantifier theory of natural language, due to Barwise and Cooper. The
underlying setting is a compact closed category with bialgebras. We start from
a generative grammar formalisation and develop an abstract categorical
compositional semantics for it, then instantiate the abstract setting to sets
and relations and to finite dimensional vector spaces and linear maps. We prove
the equivalence of the relational instantiation to the truth theoretic
semantics of generalised quantifiers. The vector space instantiation formalises
the statistical usages of words and enables us to, for the first time, reason
about quantified phrases and sentences compositionally in distributional
semantics