59,210 research outputs found
A Quantitative Evaluation of Global Word Sense Induction
International audienceWord sense induction (WSI) is the task aimed at automatically identifying the senses of words in texts, without the need for handcrafted resources or annotated data. Up till now, most WSI algorithms extract the different senses of a word 'locally' on a per-word basis, i.e. the different senses for each word are determined separately. In this paper, we compare the performance of such algorithms to an algorithm that uses a 'global' approach, i.e. the different senses of a particular word are determined by comparing them to, and demarcating them from, the senses of other words in a full-blown word space model. We adopt the evaluation framework proposed in the SemEval-2010 Word Sense Induction \& Disambiguation task. All systems that participated in this task use a local scheme for determining the different senses of a word. We compare their results to the ones obtained by the global approach, and discuss the advantages and weaknesses of both approaches
Breaking Sticks and Ambiguities with Adaptive Skip-gram
Recently proposed Skip-gram model is a powerful method for learning
high-dimensional word representations that capture rich semantic relationships
between words. However, Skip-gram as well as most prior work on learning word
representations does not take into account word ambiguity and maintain only
single representation per word. Although a number of Skip-gram modifications
were proposed to overcome this limitation and learn multi-prototype word
representations, they either require a known number of word meanings or learn
them using greedy heuristic approaches. In this paper we propose the Adaptive
Skip-gram model which is a nonparametric Bayesian extension of Skip-gram
capable to automatically learn the required number of representations for all
words at desired semantic resolution. We derive efficient online variational
learning algorithm for the model and empirically demonstrate its efficiency on
word-sense induction task
An algorithm for cross-lingual sense-clustering tested in a MT evaluation setting
Unsupervised sense induction methods offer a solution to the
problem of scarcity of semantic resources. These methods
automatically extract semantic information from textual data
and create resources adapted to specific applications and domains of interest. In this paper, we present a clustering algorithm for cross-lingual sense induction which generates
bilingual semantic inventories from parallel corpora. We describe the clustering procedure and the obtained resources. We then proceed to a large-scale evaluation by integrating the resources into a Machine Translation (MT) metric (METEOR). We show that the use of the data-driven sense-cluster inventories leads to better correlation with human judgments of translation quality, compared to precision-based metrics, and to improvements similar to those obtained when a handcrafted semantic resource is used
- …