Skip to main content
Article thumbnail
Location of Repository

Learning Analogies and Semantic Relations

By Peter Turney and Michael Littman


We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the Scholastic Aptitude Test (SAT). A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct). We motivate this research by relating it to work in cognitive science and linguistics, and by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for these challenging problems

Topics: Language, Computational Linguistics, Semantics, Machine Learning
Year: 2003
OAI identifier:

Suggested articles


  1. (2000). 10 Real SATs. College Entrance Examination Board.
  2. (1993). Accurate methods for the statistics of surprise and coincidence.
  3. (1994). Algorithm for automatic interpretation of noun sequences.
  4. (1995). and the Fluid Analogies Research Group
  5. (1992). Automatic acquisition of hyponyms from large text corpora.
  6. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems.
  7. (1992). Computer understanding of conventional metaphoric language.
  8. (2002). Discovering word senses from text.
  9. (2003). Driver memory, traffic viscosity and a viscous vehicular traffic flow model. doi
  10. (1991). Evaluating text categorization.
  11. (2003). Exploring noun-modifier semantic relations.
  12. (1999). Finding parts in very large corpora.
  13. (1985). Generalized vector space model in information retrieval. doi
  14. (1997). Learning Analogies and Semantic Relations Turney and Littman
  15. (1995). Metaphor as an emergent property of machine-readable dictionaries.
  16. (1980). Metaphors We Live By.
  17. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL.
  18. (1999). Modern Information Retrieval.
  19. (1998). Semi-automatic recognition of noun modifier relationships.
  20. (2003). Stability of macroscopic traffic flow modeling through wavefront expansion. doi
  21. (1994). The cell transmission model: A dynamic representation of highway traffic consistent with the hydrodynamic theory.
  22. (2002). The computational modeling of analogy-making.
  23. (1987). Women, Fire, and Dangerous Things.
  24. (1989). Word association norms, mutual information and lexicography.
  25. (1969). Word-word associations in document retrieval systems.
  26. (1998). WordNet: An Electronic Lexical Database.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.