1,013 research outputs found
Learning Graph Embeddings from WordNet-based Similarity Measures
We present path2vec, a new approach for learning graph embeddings that relies
on structural measures of pairwise node similarities. The model learns
representations for nodes in a dense space that approximate a given
user-defined graph distance measure, such as e.g. the shortest path distance or
distance measures that take information beyond the graph structure into
account. Evaluation of the proposed model on semantic similarity and word sense
disambiguation tasks, using various WordNet-based similarity measures, show
that our approach yields competitive results, outperforming strong graph
embedding baselines. The model is computationally efficient, being orders of
magnitude faster than the direct computation of graph-based distances.Comment: Accepted to StarSem 201
Retrieving with good sense
Although always present in text, word sense ambiguity only recently became regarded as a problem to information
retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in
disambiguation research. This paper first outlines this research and surveys the resulting efforts in information
retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt
from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Capturing lexical variation in MT evaluation using automatically built sense-cluster inventories
The strict character of most of the existing Machine Translation (MT) evaluation metrics does not permit them to capture lexical variation in translation. However, a central
issue in MT evaluation is the high correlation that the metrics should have with human judgments of translation quality. In order to achieve a higher correlation, the identification of sense correspondences between the compared translations becomes really important. Given
that most metrics are looking for exact correspondences, the evaluation results are often misleading concerning translation quality. Apart from that, existing metrics do not permit one to make a conclusive estimation of the impact of Word Sense Disambiguation techniques into
MT systems. In this paper, we show how information acquired by an unsupervised semantic analysis method can be used to render MT evaluation more sensitive to lexical semantics. The sense inventories built by this data-driven method are incorporated into METEOR: they replace WordNet for evaluation in English and render METEOR’s synonymy module operable in French. The evaluation results demonstrate that the use of these inventories gives rise to an increase in the number of matches and the correlation with human judgments of translation quality, compared to precision-based metrics
Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods
In this paper we concentrate on the resolution of the lexical ambiguity that
arises when a given word has several different meanings. This specific task is
commonly referred to as word sense disambiguation (WSD). The task of WSD
consists of assigning the correct sense to words using an electronic dictionary
as the source of word definitions. We present two WSD methods based on two main
methodological approaches in this research area: a knowledge-based method and a
corpus-based method. Our hypothesis is that word-sense disambiguation requires
several knowledge sources in order to solve the semantic ambiguity of the
words. These sources can be of different kinds--- for example, syntagmatic,
paradigmatic or statistical information. Our approach combines various sources
of knowledge, through combinations of the two WSD methods mentioned above.
Mainly, the paper concentrates on how to combine these methods and sources of
information in order to achieve good results in the disambiguation. Finally,
this paper presents a comprehensive study and experimental work on evaluation
of the methods and their combinations
Cross-domain polarity classification using a knowledge-enhanced meta-classifier
Current approaches to single and cross-domain polarity classification usually use bag of words, n-grams
or lexical resource-based classifiers. In this paper, we propose the use of meta-learning to combine and
enrich those approaches by adding also other knowledge-based features. In addition to the aforementioned
classical approaches, our system uses the BabelNet multilingual semantic network to generate features
derived from word sense disambiguation and vocabulary expansion. Experimental results show
state-of-the-art performance on single and cross-domain polarity classification. Contrary to other
approaches, ours is generic. These results were obtained without any domain adaptation technique.
Moreover, the use of meta-learning allows our approach to obtain the most stable results across domains.
Finally, our empirical analysis provides interesting insights on the use of semantic network-based
features.European Comission WIQ-EI IRSES (No. 269180)Ministerio de Economía y Competitividad TIN2012-38603-C02-01Ministerio de Economía y Competitividad TIN2012-38536-C03-02Junta de Andalucía P11-TIC-7684 M
- …