779 research outputs found
Annotating Words Using WordNet Semantic Glosses
An approach to the word sense disambiguation (WSD) relaying on
the WordNet synsets is proposed. The method uses semantically tagged glosses
to perform a process similar to the spreading activation in semantic network,
creating ranking of the most probable meanings for word annotation. Preliminary
evaluation shows quite promising results. Comparison with the state-of-theart
WSD methods indicates that the use of WordNet relations and semantically
tagged glosses should enhance accuracy of word disambiguation methods
ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing
In this paper, we present a novel unsupervised algorithm for word sense
disambiguation (WSD) at the document level. Our algorithm is inspired by a
widely-used approach in the field of genetics for whole genome sequencing,
known as the Shotgun sequencing technique. The proposed WSD algorithm is based
on three main steps. First, a brute-force WSD algorithm is applied to short
context windows (up to 10 words) selected from the document in order to
generate a short list of likely sense configurations for each window. In the
second step, these local sense configurations are assembled into longer
composite configurations based on suffix and prefix matching. The resulted
configurations are ranked by their length, and the sense of each word is chosen
based on a voting scheme that considers only the top k configurations in which
the word appears. We compare our algorithm with other state-of-the-art
unsupervised WSD algorithms and demonstrate better performance, sometimes by a
very large margin. We also show that our algorithm can yield better performance
than the Most Common Sense (MCS) baseline on one data set. Moreover, our
algorithm has a very small number of parameters, is robust to parameter tuning,
and, unlike other bio-inspired methods, it gives a deterministic solution (it
does not involve random choices).Comment: In Proceedings of EACL 201
Extending, trimming and fusing WordNet for technical documents
This paper describes a tool for the automatic
extension and trimming of a multilingual
WordNet database for cross-lingual retrieval
and multilingual ontology building in
intranets and domain-specific document
collections. Hierarchies, built from
automatically extracted terms and combined
with the WordNet relations, are trimmed
with a disambiguation method based on the
document salience of the words in the
glosses. The disambiguation is tested in a
cross-lingual retrieval task, showing
considerable improvement (7%-11%). The
condensed hierarchies can be used as
browse-interfaces to the documents
complementary to retrieval
Grouping Synonyms by Definitions
We present a method for grouping the synonyms of a lemma according to its
dictionary senses. The senses are defined by a large machine readable
dictionary for French, the TLFi (Tr\'esor de la langue fran\c{c}aise
informatis\'e) and the synonyms are given by 5 synonym dictionaries (also for
French). To evaluate the proposed method, we manually constructed a gold
standard where for each (word, definition) pair and given the set of synonyms
defined for that word by the 5 synonym dictionaries, 4 lexicographers specified
the set of synonyms they judge adequate. While inter-annotator agreement ranges
on that task from 67% to at best 88% depending on the annotator pair and on the
synonym dictionary being considered, the automatic procedure we propose scores
a precision of 67% and a recall of 71%. The proposed method is compared with
related work namely, word sense disambiguation, synonym lexicon acquisition and
WordNet construction
- …