2,038 research outputs found
Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Interpretability of a predictive model is a powerful feature that gains the
trust of users in the correctness of the predictions. In word sense
disambiguation (WSD), knowledge-based systems tend to be much more
interpretable than knowledge-free counterparts as they rely on the wealth of
manually-encoded elements representing word senses, such as hypernyms, usage
examples, and images. We present a WSD system that bridges the gap between
these two so far disconnected groups of methods. Namely, our system, providing
access to several state-of-the-art WSD models, aims to be interpretable as a
knowledge-based system while it remains completely unsupervised and
knowledge-free. The presented tool features a Web interface for all-word
disambiguation of texts that makes the sense predictions human readable by
providing interpretable word sense inventories, sense representations, and
disambiguation results. We provide a public API, enabling seamless integration.Comment: In Proceedings of the the Conference on Empirical Methods on Natural
Language Processing (EMNLP 2017). 2017. Copenhagen, Denmark. Association for
Computational Linguistic
AutoSense Model for Word Sense Induction
Word sense induction (WSI), or the task of automatically discovering multiple
senses or meanings of a word, has three main challenges: domain adaptability,
novel sense detection, and sense granularity flexibility. While current latent
variable models are known to solve the first two challenges, they are not
flexible to different word sense granularities, which differ very much among
words, from aardvark with one sense, to play with over 50 senses. Current
models either require hyperparameter tuning or nonparametric induction of the
number of senses, which we find both to be ineffective. Thus, we aim to
eliminate these requirements and solve the sense granularity problem by
proposing AutoSense, a latent variable model based on two observations: (1)
senses are represented as a distribution over topics, and (2) senses generate
pairings between the target word and its neighboring word. These observations
alleviate the problem by (a) throwing garbage senses and (b) additionally
inducing fine-grained word senses. Results show great improvements over the
state-of-the-art models on popular WSI datasets. We also show that AutoSense is
able to learn the appropriate sense granularity of a word. Finally, we apply
AutoSense to the unsupervised author name disambiguation task where the sense
granularity problem is more evident and show that AutoSense is evidently better
than competing models. We share our data and code here:
https://github.com/rktamplayo/AutoSense.Comment: AAAI 201
Improving Hypernymy Extraction with Distributional Semantic Classes
In this paper, we show how distributionally-induced semantic classes can be
helpful for extracting hypernyms. We present methods for inducing sense-aware
semantic classes using distributional semantics and using these induced
semantic classes for filtering noisy hypernymy relations. Denoising of
hypernyms is performed by labeling each semantic class with its hypernyms. On
the one hand, this allows us to filter out wrong extractions using the global
structure of distributionally similar senses. On the other hand, we infer
missing hypernyms via label propagation to cluster terms. We conduct a
large-scale crowdsourcing study showing that processing of automatically
extracted hypernyms using our approach improves the quality of the hypernymy
extraction in terms of both precision and recall. Furthermore, we show the
utility of our method in the domain taxonomy induction task, achieving the
state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and
Evaluation (LREC 2018). Miyazaki, Japa
What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
In this paper, we claim that Vector Cosine, which is generally considered one
of the most efficient unsupervised measures for identifying word similarity in
Vector Space Models, can be outperformed by a completely unsupervised measure
that evaluates the extent of the intersection among the most associated
contexts of two target words, weighting such intersection according to the rank
of the shared contexts in the dependency ranked lists. This claim comes from
the hypothesis that similar words do not simply occur in similar contexts, but
they share a larger portion of their most relevant contexts compared to other
related words. To prove it, we describe and evaluate APSyn, a variant of
Average Precision that, independently of the adopted parameters, outperforms
the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the
best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy
in the TOEFL dataset, beating therefore the non-English US college applicants
(whose average, as reported in the literature, is 64.50%) and several
state-of-the-art approaches.Comment: in LREC 201
Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories
The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central
issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given
that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into
MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR’s synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics
Cross-Lingual Induction and Transfer of Verb Classes Based on Word Vector Space Specialisation
Existing approaches to automatic VerbNet-style verb classification are
heavily dependent on feature engineering and therefore limited to languages
with mature NLP pipelines. In this work, we propose a novel cross-lingual
transfer method for inducing VerbNets for multiple languages. To the best of
our knowledge, this is the first study which demonstrates how the architectures
for learning word embeddings can be applied to this challenging
syntactic-semantic task. Our method uses cross-lingual translation pairs to tie
each of the six target languages into a bilingual vector space with English,
jointly specialising the representations to encode the relational information
from English VerbNet. A standard clustering algorithm is then run on top of the
VerbNet-specialised representations, using vector dimensions as features for
learning verb classes. Our results show that the proposed cross-lingual
transfer approach sets new state-of-the-art verb classification performance
across all six target languages explored in this work.Comment: EMNLP 2017 (long paper
Russian word sense induction by clustering averaged word embeddings
The paper reports our participation in the shared task on word sense
induction and disambiguation for the Russian language (RUSSE-2018). Our team
was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th
for the bts-rnc and active-dict datasets (containing mostly polysemous words)
among all 19 participants.
The method we employed was extremely naive. It implied representing contexts
of ambiguous words as averaged word embedding vectors, using off-the-shelf
pre-trained distributional models. Then, these vector representations were
clustered with mainstream clustering techniques, thus producing the groups
corresponding to the ambiguous word senses. As a side result, we show that word
embedding models trained on small but balanced corpora can be superior to those
trained on large but noisy data - not only in intrinsic evaluation, but also in
downstream tasks like word sense induction.Comment: Proceedings of the 24rd International Conference on Computational
Linguistics and Intellectual Technologies (Dialogue-2018
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
- …