9,485 research outputs found
Corpus-based Learning of Analogies and Semantic Relations
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
Recognizing analogies, synonyms, antonyms, and associations appear to be four\ud
distinct tasks, requiring distinct NLP algorithms. In the past, the four\ud
tasks have been treated independently, using a wide variety of algorithms.\ud
These four semantic classes, however, are a tiny sample of the full\ud
range of semantic phenomena, and we cannot afford to create ad hoc algorithms\ud
for each semantic phenomenon; we need to seek a unified approach.\ud
We propose to subsume a broad range of phenomena under analogies.\ud
To limit the scope of this paper, we restrict our attention to the subsumption\ud
of synonyms, antonyms, and associations. We introduce a supervised corpus-based\ud
machine learning algorithm for classifying analogous word pairs, and we\ud
show that it can solve multiple-choice SAT analogy questions, TOEFL\ud
synonym questions, ESL synonym-antonym questions, and similar-associated-both\ud
questions from cognitive psychology
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
Embedding Semantic Relations into Word Representations
Learning representations for semantic relations is important for various
tasks such as analogy detection, relational search, and relation
classification. Although there have been several proposals for learning
representations for individual words, learning word representations that
explicitly capture the semantic relations between words remains under
developed. We propose an unsupervised method for learning vector
representations for words such that the learnt representations are sensitive to
the semantic relations that exist between two words. First, we extract lexical
patterns from the co-occurrence contexts of two words in a corpus to represent
the semantic relations that exist between those two words. Second, we represent
a lexical pattern as the weighted sum of the representations of the words that
co-occur with that lexical pattern. Third, we train a binary classifier to
detect relationally similar vs. non-similar lexical pattern pairs. The proposed
method is unsupervised in the sense that the lexical pattern pairs we use as
train data are automatically sampled from a corpus, without requiring any
manual intervention. Our proposed method statistically significantly
outperforms the current state-of-the-art word representations on three
benchmark datasets for proportional analogy detection, demonstrating its
ability to accurately capture the semantic relations among words.Comment: International Joint Conferences in AI (IJCAI) 201
The Latent Relation Mapping Engine: Algorithm and Experiments
Many AI researchers and cognitive scientists have argued that analogy is the
core of cognition. The most influential work on computational modeling of
analogy-making is Structure Mapping Theory (SMT) and its implementation in the
Structure Mapping Engine (SME). A limitation of SME is the requirement for
complex hand-coded representations. We introduce the Latent Relation Mapping
Engine (LRME), which combines ideas from SME and Latent Relational Analysis
(LRA) in order to remove the requirement for hand-coded representations. LRME
builds analogical mappings between lists of words, using a large corpus of raw
text to automatically discover the semantic relations among the words. We
evaluate LRME on a set of twenty analogical mapping problems, ten based on
scientific analogies and ten based on common metaphors. LRME achieves
human-level performance on the twenty problems. We compare LRME with a variety
of alternative approaches and find that they are not able to reach the same
level of performance.Comment: related work available at http://purl.org/peter.turney
Learning Analogies and Semantic Relations
We present an algorithm for learning from unlabeled text, based on the
Vector Space Model (VSM) of information retrieval, that can solve verbal
analogy questions of the kind found in the Scholastic Aptitude Test (SAT).
A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D";
for example, mason:stone::carpenter:wood. SAT analogy questions provide
a word pair, A:B, and the problem is to select the most analogous word
pair, C:D, from a set of five choices. The VSM algorithm correctly
answers 47% of a collection of 374 college-level analogy questions
(random guessing would yield 20% correct). We motivate this research by
relating it to work in cognitive science and linguistics, and by applying
it to a difficult problem in natural language processing, determining
semantic relations in noun-modifier pairs. The problem is to classify a
noun-modifier pair, such as "laser printer", according to the semantic
relation between the noun (printer) and the modifier (laser). We use a
supervised nearest-neighbour algorithm that assigns a class to a given
noun-modifier pair by finding the most analogous noun-modifier pair in
the training data. With 30 classes of semantic relations, on a collection
of 600 labeled noun-modifier pairs, the learning algorithm attains an F
value of 26.5% (random guessing: 3.3%). With 5 classes of semantic
relations, the F value is 43.2% (random: 20%). The performance is
state-of-the-art for these challenging problems
Distributional semantics beyond words: Supervised learning of analogy and paraphrase
There have been several efforts to extend distributional semantics beyond
individual words, to measure the similarity of word pairs, phrases, and
sentences (briefly, tuples; ordered sets of words, contiguous or
noncontiguous). One way to extend beyond words is to compare two tuples using a
function that combines pairwise similarities between the component words in the
tuples. A strength of this approach is that it works with both relational
similarity (analogy) and compositional similarity (paraphrase). However, past
work required hand-coding the combination function for different tasks. The
main contribution of this paper is that combination functions are generated by
supervised learning. We achieve state-of-the-art results in measuring
relational similarity between word pairs (SAT analogies and SemEval~2012 Task
2) and measuring compositional similarity between noun-modifier phrases and
unigrams (multiple-choice paraphrase questions)
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
- …