Search CORE

443 research outputs found

Embeddings for word sense disambiguation: an evaluation study

Author: Iacobacci IGNACIO JAVIER
Navigli Roberto
Pilehvar MOHAMMED TAHER
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability to capture semantic information from massive amounts of textual content. As a result, many tasks in Natural Language Processing have tried to take advantage of the potential of these distributional models. In this work, we study how word embeddings can be used in Word Sense Disambiguation, one of the oldest tasks in Natural Language Processing and Artificial Intelligence. We propose different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and perform a deep analysis of how different parameters affect performance. We show how a WSD system that makes use of word embeddings alone, if designed properly, can provide significant performance improvement over a state-of-the-art WSD system that incorporates several standard WSD features

Archivio della ricerca- Università di Roma La Sapienza

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

Author: Butnaru Andrei M.
Hristea Florentina
Ionescu Radu Tudor
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).Comment: In Proceedings of EACL 201

arXiv.org e-Print Archive

Crossref

Huge automatically extracted training sets for multilingual Word Sense Disambiguation

Author: Carr A.J.
Collins G.S.
Cook J.
Feakins B.G.
Gerry S.
Judge A.
Wartolowska K.A.
Publication venue
Publication date: 01/01/2017
Field of study

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation. Our datasets cover all the nouns in the English WordNet and their translations in other languages for a total of millions of sense-tagged sentences. Experiments prove that these corpora can be effectively used as training sets for supervised WSD systems, surpassing the state of the art for low- resourced languages and providing competitive results for English, where manually annotated training sets are accessible. The data is available at trainomatic. org

Directory of Open Access Journals

Oxford University Research Archive

Archivio della ricerca- Università di Roma La Sapienza

Explore Bristol Research

Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities

Author: Turney Peter D.
Publication venue
Publication date: 01/01/2004
Field of study

This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word co-occurrence probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler

arXiv.org e-Print Archive

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Word Sense Disambiguation using a Bidirectional LSTM

Author: Kågebäck Mikael
Salomonsson Hans
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we present a clean, yet effective, model for word sense disambiguation. Our approach leverage a bidirectional long short-term memory network which is shared between all words. This enables the model to share statistical strength and to scale well with vocabulary size. The model is trained end-to-end, directly from the raw text to sense labels, and makes effective use of word order. We evaluate our approach on two standard datasets, using identical hyperparameter settings, which are in turn tuned on a third set of held out data. We employ no external resources (e.g. knowledge graphs, part-of-speech tagging, etc), language specific features, or hand crafted rules, but still achieve statistically equivalent results to the best state-of-the-art systems, that employ no such limitations

arXiv.org e-Print Archive

Chalmers Research

Chalmers Publication Library

Two knowledge-based methods for High-Performance Sense Distribution Learning

Author: Navigli Roberto
Pasini Tommaso
Publication venue
Publication date: 01/01/2018
Field of study

Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at http://trainomatic.org

Archivio della ricerca- Università di Roma La Sapienza

Word sense disambiguation criteria: a systematic study

Author: Audibert Laurent
Publication venue
Publication date: 01/01/2004
Field of study

This article describes the results of a systematic in-depth study of the criteria used for word sense disambiguation. Our study is based on 60 target words: 20 nouns, 20 adjectives and 20 verbs. Our results are not always in line with some practices in the field. For example, we show that omitting non-content words decreases performance and that bigrams yield better results than unigrams

arXiv.org e-Print Archive

CiteSeerX

HAL AMU