38 research outputs found

    Co-occurrence Vectors from Corpora vs. Distance Vectors from Dictionaries

    Full text link
    A comparison was made of vectors derived by using ordinary co-occurrence statistics from large text corpora and of vectors derived by measuring the inter-word distances in dictionary definitions. The precision of word sense disambiguation by using co-occurrence vectors from the 1987 Wall Street Journal (20M total words) was higher than that by using distance vectors from the Collins English Dictionary (60K head words + 1.6M definition words). However, other experimental results suggest that distance vectors contain some different semantic information from co-occurrence vectors.Comment: 6 pages, appeared in the Proc. of COLING94 (pp. 304-309)

    Converting Language Computation into Mathematical Operations

    Get PDF

    Distinguishing Word Senses in Untagged Text

    Full text link
    This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set.Comment: 11 pages, latex, uses aclap.st

    Retrieving with good sense

    Get PDF
    Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

    Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation

    Full text link
    We replace the overlap mechanism of the Lesk algorithm with a simple, general-purpose Naive Bayes model that mea-sures many-to-many association between two sets of random variables. Even with simple probability estimates such as max-imum likelihood, the model gains signifi-cant improvement over the Lesk algorithm on word sense disambiguation tasks. With additional lexical knowledge from Word-Net, performance is further improved to surpass the state-of-the-art results.

    Mapping Persian Words to WordNet Synsets

    Get PDF
    Lexical ontologies are one of the main resources for developing natural language processing and semantic web applications. Mapping lexical ontologies of different languages is very important for inter-lingual tasks. On the other hand mapping approaches can be implied to build lexical ontologies for a new language based on pre-existing resources of other languages. In this paper we propose a semantic approach for mapping Persian words to Princeton WordNet Synsets. As there is no lexical ontology for Persian, our approach helps not only in building one for this language but also enables semantic web applications on Persian documents. To do the mapping, we calculate the similarity of Persian words and English synsets using their features such as super-classes and subclasses, domain and related words. Our approach is an improvement of an existing one applying in a new domain, which increases the recall noticeably

    Sense and preference

    Get PDF
    AbstractSemantic networks have shown considerable utility as a knowledge representation for Natural Language Processing (NLP). This paper describes a system for automatically deriving network structures from machine-readable dictionary text. This strategy helps to solve the problem of vocabulary acquisition for large-scale parsing systems, but also introduces an extra level of difficulty in terms of word-sense ambiguity. A Preference Semantics parsing system that operates over this network is discussed, in particular as regards its mechanism for using the network for lexical selection

    The interaction of knowledge sources in word sense disambiguation

    Get PDF
    Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems
    corecore