Skip to main content
Article thumbnail
Location of Repository

Measuring Semantic Similarity by Latent Relational Analysis

By Peter D. Turney

Abstract

This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks

Topics: Language, Computational Linguistics, Semantics, Machine Learning, Artificial Intelligence
Year: 2005
OAI identifier: oai:cogprints.org:4501
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://cogprints.org/4501/1/NR... (external link)
  • http://cogprints.org/4501/ (external link)
  • Suggested articles

    Citations

    1. (1997). A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge.
    2. (1998). An overview of MultiText.
    3. (1998). Automatic retrieval and clustering of similar words.
    4. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone.
    5. (1991). Evaluating text categorization.
    6. (2003). Extended gloss overlaps as a measure of semantic relatedness.
    7. (2001). Extracting paraphrases from a parallel corpus.
    8. (1992). Large scale singular value computations.
    9. (1996). Matrix Computations. Third edition.
    10. (2004). Probabilistic textual entailment: Generic applied modeling of language variability.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.