101,747 research outputs found

    Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods

    Full text link
    Measuring the similarity of short written contexts is a fundamental problem in Natural Language Processing. This article provides a unifying framework by which short context problems can be categorized both by their intended application and proposed solution. The goal is to show that various problems and methodologies that appear quite different on the surface are in fact very closely related. The axes by which these categorizations are made include the format of the contexts (headed versus headless), the way in which the contexts are to be measured (first-order versus second-order similarity), and the information used to represent the features in the contexts (micro versus macro views). The unifying thread that binds together many short context applications and methods is the fact that similarity decisions must be made between contexts that share few (if any) words in common.Comment: 23 page

    Discriminating word senses with tourist walks in complex networks

    Full text link
    Patterns of topological arrangement are widely used for both animal and human brains in the learning process. Nevertheless, automatic learning techniques frequently overlook these patterns. In this paper, we apply a learning technique based on the structural organization of the data in the attribute space to the problem of discriminating the senses of 10 polysemous words. Using two types of characterization of meanings, namely semantical and topological approaches, we have observed significative accuracy rates in identifying the suitable meanings in both techniques. Most importantly, we have found that the characterization based on the deterministic tourist walk improves the disambiguation process when one compares with the discrimination achieved with traditional complex networks measurements such as assortativity and clustering coefficient. To our knowledge, this is the first time that such deterministic walk has been applied to such a kind of problem. Therefore, our finding suggests that the tourist walk characterization may be useful in other related applications

    Word Sense Determination from Wikipedia Data Using Neural Networks

    Get PDF
    Many words have multiple meanings. For example, “plant” can mean a type of living organism or a factory. Being able to determine the sense of such words is very useful in natural language processing tasks, such as speech synthesis, question answering, and machine translation. For the project described in this report, we used a modular model to classify the sense of words to be disambiguated. This model consisted of two parts: The first part was a neural-network-based language model to compute continuous vector representations of words from data sets created from Wikipedia pages. The second part classified the meaning of the given word without explicitly knowing what the meaning is. In this unsupervised word sense determination task, we did not need human-tagged training data or a dictionary of senses for each word. We tested the model with some naturally ambiguous words, and compared our experimental results with the related work by Schütze in 1998. Our model achieved similar accuracy as Schütze’s work for some words

    The Epistemological Foundations of Knowledge Representations

    Get PDF
    This paper looks at the epistemological foundations of knowledge representations embodied in retrieval languages. It considers questions such as the validity of knowledge representations and their effectiveness for the purposes of retrieval and automation. The knowledge representations it considers are derived from three theories of meaning that have dominated twentieth-century philosophy.published or submitted for publicatio

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Get PDF
    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc
    • …
    corecore