2 research outputs found

    Selective Sampling for Example-based Word Sense Disambiguation

    Full text link
    This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure

    Automatic Thesaurus Construction Based on Grammatical Relations

    No full text
    We propose a method to build thesauri on the basis of grammatical relations. The proposed method constructs thesauri by using a hierarchical clustering algorithm. An important point in this paper is the claim that thesauri in order to be efficient need to take (surface) case information into account. We refer to the thesauri as "relation-based thesaurus (RBT)." In the experiment, four RBTs of Japanese nouns were constructed from 26,023 verb-noun co-occurrences, and each RBT was evaluated by objective criteria. The experiment has shown that the RBTs have better properties for selectional restriction of case frames than conventional ones. 1 Introduction For most natural language processing (NLP) systems, thesauri are one of the basic ingredients. In particular, coupled with case frames, they are useful to guide correct analysis [ Allen, 1988 ] . In the example-based frameworks, thesauri are also used to compensate for insufficient example data [ Sato and Nagao, 1990, Nagao and Kurohashi..
    corecore