4 research outputs found

    Estimating semantic distance using soft semantic constraints in knowledge-source-corpus hybrid models

    No full text
    Strictly corpus-based measures of seman-tic distance conflate co-occurrence infor-mation pertaining to the many possible senses of target words. We propose a corpus–thesaurus hybrid method that uses soft constraints to generate word-sense-aware distributional profiles (DPs) from coarser “concept DPs ” (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabulary-limited: if the target word is not in the thesaurus, the method falls back grace-fully on the word’s co-occurrence infor-mation. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domain-specific terms and named entities. Exper-iments on word-pair ranking by semantic distance show the new hybrid method to be superior to others.

    Using a high-dimensional model of semantic space to predict neural activity

    Get PDF
    This dissertation research developed the GOLD model (Graph Of Language Distribution), a graph-structured semantic space model constructed based on co-occurrence in a large corpus of natural language, with the intent that it may be used to explore what information may be present about relationships between words in such a model and the degree to which this information may be used to predict brain responses and behavior in language tasks. The present study employed GOLD to examine genera relatedness as well as two specific types of relationship between words: semantic similarity, which refers to the degree of overlap in meaning between words, and associative relatedness, which refers to the degree to which two words occur in the same schematic context. It was hypothesized that this graph-structured model of language constructed based on co-occurrence should easily capture associative relatedness, because this type of relationship is thought to be present directly in lexical co-occurrence. Additionally, it was hypothesized that semantic similarity may be extracted from the intersection of the set of first-order connections, because two words that are semantically similar may occupy similar thematic or syntactic roles across contexts and thus would co-occur lexically with the same set of nodes. Based on these hypotheses, a set of relationship metrics were extracted from the GOLD model, and machine learning techniques were used to explore predictive properties of these metrics. GOLD successfully predicted behavioral data as well as neural activity in response to words with varying relationships, and its predictions outperformed those of certain competing models. These results suggest that a single-mechanism account of learning word meaning from context may suffice to account for a variety of relationships between words. Further benefits of graph models of language are discussed, including their transparent record of language experience, easy interpretability, and increased psychologically plausibility over models that perform complex transformations of meaning representation
    corecore