14 research outputs found
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2)
This paper describes the second edition of the shared task on Taxonomy Extraction Evaluation organised as part of SemEval 2016. This task aims to extract hypernym-hyponym relations between a given list of domain-specific terms and then to construct a domain taxonomy based on them. TExEval-2 introduced a multilingual setting for this task, covering four different languages including English, Dutch, Italian and French from domains as diverse as environment, food and science. A total of
62 runs submitted by 5 different teams were
evaluated using structural measures, by comparison with gold standard taxonomies and by manual quality assessment of novel relations.Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (INSIGHT
Improving Hypernymy Extraction with Distributional Semantic Classes
In this paper, we show how distributionally-induced semantic classes can be
helpful for extracting hypernyms. We present methods for inducing sense-aware
semantic classes using distributional semantics and using these induced
semantic classes for filtering noisy hypernymy relations. Denoising of
hypernyms is performed by labeling each semantic class with its hypernyms. On
the one hand, this allows us to filter out wrong extractions using the global
structure of distributionally similar senses. On the other hand, we infer
missing hypernyms via label propagation to cluster terms. We conduct a
large-scale crowdsourcing study showing that processing of automatically
extracted hypernyms using our approach improves the quality of the hypernymy
extraction in terms of both precision and recall. Furthermore, we show the
utility of our method in the domain taxonomy induction task, achieving the
state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and
Evaluation (LREC 2018). Miyazaki, Japa
A supervised approach to taxonomy extraction using word embeddings
Large collections of texts are commonly generated by large organizations and making sense of these collections of texts is a significant challenge. One method for handling this is to organize the concepts into a hierarchical structure such that similar concepts can be discovered and easily browsed. This approach was the subject of a recent evaluation campaign, TExEval, however the results of this task showed that none of the systems consistently outperformed a relatively simple baseline.In order to solve this issue, we propose a new method that uses supervised learning to combine multiple features with a support vector machine classifier including the baseline features. We show that this outperforms the baseline and thus provides a stronger method for identifying taxonomic relations than previous method
Learning Word Subsumption Projections for the Russian Language
The semantic relations of hypernymy and hyponymy are widely used in various natural language processing tasks for modelling the subsumptions in common sense reasoning. Since the popularisation of the distributional semantics, a significant attention is paid to applying word embeddings for inducing the relations between words. In this paper, we show our preliminary results on adopting the projection learning technique for computing hypernyms from hyponyms using word embeddings. We also conduct a series of experiments on the Russian language and release the open source software for learning hyponym-hypernym projections using both CPUs and GPUs, implemented with the TensorFlow machine learning framework
KIND: Un proyecto de inducción automática de taxonomías léxicas
This paper presents a description of the Kind Project, an algorithm for automatic induction of lexical taxonomies from corpora. Taxonomy induction consists of the discovery of hypernymy relations between single or multiword noun pairs, and the integration of these pairs into larger structures. The proposed methodology is fundamentally statistical and the requirement of linguistic resources is minimal, a characteristic that facilitates the reproduction of experiments in different languages. The languages for which results have been obtained so far are Spanish, English and French. The implementation of the algorithm and an online demo are available as open source on the projects’ website.
Este artículo presenta una descripción del Proyecto Kind, un algoritmo para inducción automática de taxonomías léxicas a partir de corpus. La inducción de taxonomías consiste en el descubrimiento de relaciones de hiperonimia entre pares sustantivos, ya sea mono o poliléxicos, y en la integración de estos pares en estructuras mayores. La metodología propuesta es fundamentalmente estadística y tiene mínimo requerimiento de recursos lingüísticos, característica que facilita la reproducción de experimentos en distintas lenguas. Las lenguas con las que se ha experimentado hasta ahora son castellano, inglés y francés. La implementación del algoritmo y un demostrador en línea se encuentran disponibles como código abierto en el sitio web del proyecto