Search CORE

38 research outputs found

Co-occurrence Vectors from Corpora vs. Distance Vectors from Dictionaries

Author: Nitta Yoshihiko
Niwa Yoshiki
Publication venue
Publication date: 01/01/1994
Field of study

A comparison was made of vectors derived by using ordinary co-occurrence statistics from large text corpora and of vectors derived by measuring the inter-word distances in dictionary definitions. The precision of word sense disambiguation by using co-occurrence vectors from the 1987 Wall Street Journal (20M total words) was higher than that by using distance vectors from the Collins English Dictionary (60K head words + 1.6M definition words). However, other experimental results suggest that distance vectors contain some different semantic information from co-occurrence vectors.Comment: 6 pages, appeared in the Proc. of COLING94 (pp. 304-309)

arXiv.org e-Print Archive

CiteSeerX

Crossref

Converting Language Computation into Mathematical Operations

Author: Guo Chengming
Publication venue: The Logico-Linguistic Society of Japan
Publication date: 01/01/1995
Field of study

Waseda University Repository

Distinguishing Word Senses in Untagged Text

Author: Bruce Rebecca
Pedersen Ted
Publication venue
Publication date: 01/01/1997
Field of study

This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st

arXiv.org e-Print Archive

CiteSeerX

Retrieving with good sense

Author: Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

CiteSeerX

White Rose Research Online

Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation

Author: Graeme Hirst
Tong Wang
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We replace the overlap mechanism of the Lesk algorithm with a simple, general-purpose Naive Bayes model that mea-sures many-to-many association between two sets of random variables. Even with simple probability estimates such as max-imum likelihood, the model gains signifi-cant improvement over the Lesk algorithm on word sense disambiguation tasks. With additional lexical knowledge from Word-Net, performance is further improved to surpass the state-of-the-art results.

CiteSeerX

Crossref

Mapping Persian Words to WordNet Synsets

Author: Dehkharghani Rahim
Shamsfard Mehrnoush
Publication venue: International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)
Publication date: 27/09/2019
Field of study

Lexical ontologies are one of the main resources for developing natural language processing and semantic web applications. Mapping lexical ontologies of different languages is very important for inter-lingual tasks. On the other hand mapping approaches can be implied to build lexical ontologies for a new language based on pre-existing resources of other languages. In this paper we propose a semantic approach for mapping Persian words to Princeton WordNet Synsets. As there is no lexical ontology for Persian, our approach helps not only in building one for this language but also enables semantic web applications on Persian documents. To do the mapping, we calculate the similarity of Persian words and English synsets using their features such as super-classes and subclasses, domain and related words. Our approach is an improvement of an existing one applying in a new domain, which increases the recall noticeably

Re-UNIR

Sense and preference

Author: Slator Brian M.
Publication venue: Published by Elsevier Ltd.
Publication date: 31/05/1992
Field of study

AbstractSemantic networks have shown considerable utility as a knowledge representation for Natural Language Processing (NLP). This paper describes a system for automatically deriving network structures from machine-readable dictionary text. This strategy helps to solve the problem of vocabulary acquisition for large-scale parsing systems, but also introduces an extra level of difficulty in terms of word-sense ambiguity. A Preference Semantics parsing system that operates over this network is discussed, in particular as regards its mechanism for using the network for lexical selection

Elsevier - Publisher Connector

The interaction of knowledge sources in word sense disambiguation

Author: Brill Eric
Daelemans Walter
Daelemans Walter
Ide Nancy
Kilgarriff Adam
Marcus Mitchell
Mark Stevenson
Masterman Margaret
McRoy Susan
Yorick Wilks
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2001
Field of study

Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

CiteSeerX

Crossref

White Rose Research Online