6,885 research outputs found
Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods
In this paper we concentrate on the resolution of the lexical ambiguity that
arises when a given word has several different meanings. This specific task is
commonly referred to as word sense disambiguation (WSD). The task of WSD
consists of assigning the correct sense to words using an electronic dictionary
as the source of word definitions. We present two WSD methods based on two main
methodological approaches in this research area: a knowledge-based method and a
corpus-based method. Our hypothesis is that word-sense disambiguation requires
several knowledge sources in order to solve the semantic ambiguity of the
words. These sources can be of different kinds--- for example, syntagmatic,
paradigmatic or statistical information. Our approach combines various sources
of knowledge, through combinations of the two WSD methods mentioned above.
Mainly, the paper concentrates on how to combine these methods and sources of
information in order to achieve good results in the disambiguation. Finally,
this paper presents a comprehensive study and experimental work on evaluation
of the methods and their combinations
The interaction of knowledge sources in word sense disambiguation
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results.
We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems
A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval
In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail
Embeddings for word sense disambiguation: an evaluation study
Recent years have seen a dramatic growth in the popularity of word embeddings mainly owing to their ability to capture semantic information from massive amounts of textual content. As a result, many tasks in Natural Language Processing have tried to take advantage of the potential of these distributional models. In this work, we study how word embeddings can be used in Word Sense Disambiguation, one of the oldest tasks in Natural Language Processing and Artificial Intelligence. We propose different methods through which word embeddings can be leveraged in a state-of-the-art supervised WSD system architecture, and perform a deep analysis of how different parameters affect performance. We show how a WSD system that makes use of word embeddings alone, if designed properly, can provide significant performance improvement over a state-of-the-art WSD system that incorporates several standard WSD features
Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts
In this paper, we report a knowledge-based method for Word Sense
Disambiguation in the domains of biomedical and clinical text. We combine word
representations created on large corpora with a small number of definitions
from the UMLS to create concept representations, which we then compare to
representations of the context of ambiguous terms. Using no relational
information, we obtain comparable performance to previous approaches on the
MSH-WSD dataset, which is a well-known dataset in the biomedical domain.
Additionally, our method is fast and easy to set up and extend to other
domains. Supplementary materials, including source code, can be found at https:
//github.com/clips/yarnComment: 6 pages, 1 figure, presented at the 15th Workshop on Biomedical
Natural Language Processing, Berlin 201
- …