443 research outputs found
Embeddings for word sense disambiguation: an evaluation study
Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability to capture semantic information from massive amounts of textual content. As a result, many tasks in Natural Language Processing have tried to take advantage of the potential of these distributional models. In this work, we study how word embeddings can be used in Word Sense Disambiguation, one of the oldest tasks in Natural Language Processing and Artificial Intelligence. We propose different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and perform a deep analysis of how different parameters affect performance. We show how a WSD system that makes use of word embeddings alone, if designed properly, can provide significant performance improvement over a state-of-the-art WSD system that incorporates several standard WSD features
ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
In this paper, we present a novel unsupervised algorithm for word sense
disambiguation (WSD) at the document level. Our algorithm is inspired by a
widely-used approach in the field of genetics for whole genome sequencing,
known as the Shotgun sequencing technique. The proposed WSD algorithm is based
on three main steps. First, a brute-force WSD algorithm is applied to short
context windows (up to 10 words) selected from the document in order to
generate a short list of likely sense configurations for each window. In the
second step, these local sense configurations are assembled into longer
composite configurations based on suffix and prefix matching. The resulted
configurations are ranked by their length, and the sense of each word is chosen
based on a voting scheme that considers only the top k configurations in which
the word appears. We compare our algorithm with other state-of-the-art
unsupervised WSD algorithms and demonstrate better performance, sometimes by a
very large margin. We also show that our algorithm can yield better performance
than the Most Common Sense (MCS) baseline on one data set. Moreover, our
algorithm has a very small number of parameters, is robust to parameter tuning,
and, unlike other bio-inspired methods, it gives a deterministic solution (it
does not involve random choices).Comment: In Proceedings of EACL 201
Huge automatically extracted training sets for multilingual Word Sense Disambiguation
We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low- resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic. org
Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities
This paper describes the National Research Council (NRC)
Word Sense Disambiguation (WSD) system, as applied to the
English Lexical Sample (ELS) task in Senseval-3. The NRC system
approaches WSD as a classical supervised machine learning problem,
using familiar tools such as the Weka machine learning software
and Brill's rule-based part-of-speech tagger. Head words are
represented as feature vectors with several hundred features.
Approximately half of the features are syntactic and the other
half are semantic. The main novelty in the system is the method for
generating the semantic features, based on word co-occurrence
probabilities. The probabilities are estimated using
the Waterloo MultiText System with a corpus of about one terabyte of
unlabeled text, collected by a web crawler
Word Sense Disambiguation using a Bidirectional LSTM
In this paper we present a clean, yet effective, model for word sense
disambiguation. Our approach leverage a bidirectional long short-term memory
network which is shared between all words. This enables the model to share
statistical strength and to scale well with vocabulary size. The model is
trained end-to-end, directly from the raw text to sense labels, and makes
effective use of word order. We evaluate our approach on two standard datasets,
using identical hyperparameter settings, which are in turn tuned on a third set
of held out data. We employ no external resources (e.g. knowledge graphs,
part-of-speech tagging, etc), language specific features, or hand crafted
rules, but still achieve statistically equivalent results to the best
state-of-the-art systems, that employ no such limitations
Two knowledge-based methods for High-Performance Sense Distribution Learning
Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at http://trainomatic.org
Word sense disambiguation criteria: a systematic study
This article describes the results of a systematic in-depth study of the
criteria used for word sense disambiguation. Our study is based on 60 target
words: 20 nouns, 20 adjectives and 20 verbs. Our results are not always in line
with some practices in the field. For example, we show that omitting
non-content words decreases performance and that bigrams yield better results
than unigrams
- …