2,271 research outputs found

    Sense Tagging: Semantic Tagging with a Lexicon

    Full text link
    Sense tagging, the automatic assignment of the appropriate sense from some lexicon to each of the words in a text, is a specialised instance of the general problem of semantic tagging by category or type. We discuss which recent word sense disambiguation algorithms are appropriate for sense tagging. It is our belief that sense tagging can be carried out effectively by combining several simple, independent, methods and we include the design of such a tagger. A prototype of this system has been implemented, correctly tagging 86% of polysemous word tokens in a small test set, providing evidence that our hypothesis is correct.Comment: 6 pages, uses aclap LaTeX style file. Also in Proceedings of the SIGLEX Workshop "Tagging Text with Lexical Semantics

    Retrieving with good sense

    Get PDF
    Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

    Sense resolution properties of logical imaging

    Get PDF
    The evaluation of an implication by Imaging is a logical technique developed in the framework of modal logic. Its interpretation in the context of a “possible worlds” semantics is very appealing for IR. In 1994, Crestani and Van Rijsbergen proposed an interpretation of Imaging in the context of IR based on the assumption that “a term is a possibleworld”. This approach enables the exploitation of term– term relationshipswhich are estimated using an information theoretic measure. Recent analysis of the probability kinematics of Logical Imaging in IR have suggested that this technique has some interesting sense resolution properties. In this paper we will present this new line of research and we will relate it to more classical research into word senses

    Selective Sampling for Example-based Word Sense Disambiguation

    Full text link
    This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system.Comment: 25 pages, 14 Postscript figure

    A comparison of homonym meaning frequency estimates derived from movie and television subtitles, free association, and explicit ratings

    Get PDF
    First Online: 10 September 2018Most words are ambiguous, with interpretation dependent on context. Advancing theories of ambiguity resolution is important for any general theory of language processing, and for resolving inconsistencies in observed ambiguity effects across experimental tasks. Focusing on homonyms (words such as bank with unrelated meanings EDGE OF A RIVER vs. FINANCIAL INSTITUTION), the present work advances theories and methods for estimating the relative frequency of their meanings, a factor that shapes observed ambiguity effects. We develop a new method for estimating meaning frequency based on the meaning of a homonym evoked in lines of movie and television subtitles according to human raters. We also replicate and extend a measure of meaning frequency derived from the classification of free associates. We evaluate the internal consistency of these measures, compare them to published estimates based on explicit ratings of each meaning’s frequency, and compare each set of norms in predicting performance in lexical and semantic decision mega-studies. All measures have high internal consistency and show agreement, but each is also associated with unique variance, which may be explained by integrating cognitive theories of memory with the demands of different experimental methodologies. To derive frequency estimates, we collected manual classifications of 533 homonyms over 50,000 lines of subtitles, and of 357 homonyms across over 5000 homonym–associate pairs. This database—publicly available at: www.blairarmstrong.net/homonymnorms/—constitutes a novel resource for computational cognitive modeling and computational linguistics, and we offer suggestions around good practices for its use in training and testing models on labeled data

    Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

    Full text link
    Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Comment: 23 page

    Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

    Full text link
    We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated with textual units of a large background knowledge corpus. We present an efficient algorithm for learning such semantic models from a training sample of relatedness preferences. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. Moreover, the approach facilitates the fitting of semantic models for specific users or groups of users. We present the results of extensive range of experiments from small to large scale, indicating that the proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already published at ECML/PKDD 201

    An analysis and comparison of predominant word sense disambiguation algorithms

    Get PDF
    This thesis investigates research performed in the area of natural language processing. It is the aim of this research to compare a selection of predominant word sense disambiguation algorithms, and also determine if they can be optimised by small changes to the parameters used by the algorithms. To perform this research, several word sense disambiguation algorithms will be implemented in Java, and run on a range of test corpora. The algorithms will be judged on metrics such as speed and accuracy, and any other results obtained; while an algorithm may be fast and accurate, there may be other factors making it less desirable. Finally, to demonstrate the purpose and usefulness of using better algorithms, the algorithms will be used in conjunction with a real world application. Five algorithms were used in this research: The standard Lesk algorithm, the simplified Lesk algorithm, a Lesk algorithm variant using hypernyms, a Lesk algorithm variant using synonyms, and a baseline performance algorithm. While the baseline algorithm should have been less accurate than the other algorithms, testing found that it could disambiguate words more accurately than any of the other algorithms, seemingly because the baseline makes use of statistical data in WordNet, the machine readable dictionary used for testing; data unable to be used by the other algorithms. However, with a few modifications, the Simplified Lesk algorithm was able to reach performance just a few percent lower than that of the baseline algorithm. It is the aim of this research to apply word sense disambiguation to automatic concept mapping, to determine if more accurate algorithms are able to display noticeably better results in a real world application. It was found in testing, that the overall accuracy of the algorithm had little effect on the quality of concept maps produced, but rather depended on the text being examined
    • …
    corecore