1,697 research outputs found
Learning Analogies and Semantic Relations
We present an algorithm for learning from unlabeled text, based on the
Vector Space Model (VSM) of information retrieval, that can solve verbal
analogy questions of the kind found in the Scholastic Aptitude Test (SAT).
A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D";
for example, mason:stone::carpenter:wood. SAT analogy questions provide
a word pair, A:B, and the problem is to select the most analogous word
pair, C:D, from a set of five choices. The VSM algorithm correctly
answers 47% of a collection of 374 college-level analogy questions
(random guessing would yield 20% correct). We motivate this research by
relating it to work in cognitive science and linguistics, and by applying
it to a difficult problem in natural language processing, determining
semantic relations in noun-modifier pairs. The problem is to classify a
noun-modifier pair, such as "laser printer", according to the semantic
relation between the noun (printer) and the modifier (laser). We use a
supervised nearest-neighbour algorithm that assigns a class to a given
noun-modifier pair by finding the most analogous noun-modifier pair in
the training data. With 30 classes of semantic relations, on a collection
of 600 labeled noun-modifier pairs, the learning algorithm attains an F
value of 26.5% (random guessing: 3.3%). With 5 classes of semantic
relations, the F value is 43.2% (random: 20%). The performance is
state-of-the-art for these challenging problems
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
Recognizing analogies, synonyms, antonyms, and associations appear to be four\ud
distinct tasks, requiring distinct NLP algorithms. In the past, the four\ud
tasks have been treated independently, using a wide variety of algorithms.\ud
These four semantic classes, however, are a tiny sample of the full\ud
range of semantic phenomena, and we cannot afford to create ad hoc algorithms\ud
for each semantic phenomenon; we need to seek a unified approach.\ud
We propose to subsume a broad range of phenomena under analogies.\ud
To limit the scope of this paper, we restrict our attention to the subsumption\ud
of synonyms, antonyms, and associations. We introduce a supervised corpus-based\ud
machine learning algorithm for classifying analogous word pairs, and we\ud
show that it can solve multiple-choice SAT analogy questions, TOEFL\ud
synonym questions, ESL synonym-antonym questions, and similar-associated-both\ud
questions from cognitive psychology
Classification of semantic relations in different syntactic structures in medical text using the MeSH hierarchy
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaf 38).Two different classification algorithms are evaluated in recognizing semantic relationships of different syntactic compounds. The compounds, which include noun- noun, adjective-noun, noun-adjective, noun-verb, and verb-noun, were extracted from a set of doctors' notes using a part of speech tagger and a parser. Each compound was labeled with a semantic relationship, and each word in the compound was mapped to its corresponding entry in the MeSH hierarchy. MeSH includes only medical terminology so it was extended to include everyday, non-medical terms. The two classification algorithms, neural networks and a classification tree, were trained and tested on the data set for each type of syntactic compound. Models representing different levels of MeSH were generated and fed into the neural networks. Both algorithms performed better than random guessing, and the classification tree performed better than the neural networks in predicting the semantic relationship between phrases from their syntactic structure.by Neha Bhooshan.M.Eng
Recommended from our members
AXEL: A framework to deal with ambiguity in three-noun compounds
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 6/12/2010.Cognitive Linguistics has been widely used to deal with the ambiguity generated by words in combination. Although this domain offers many solutions to address this challenge, not all of them can be implemented in a computational environment. The Dynamic Construal of Meaning framework is argued to have this ability because it describes an intrinsic degree of association of meanings, which in turn, can be translated into computational programs. A limitation towards a computational approach, however, has been the lack of syntactic parameters. This research argues that this limitation could be overcome with the aid of the Generative Lexicon Theory (GLT). Specifically, this dissertation formulated possible means to marry the GLT and Cognitive Linguistics in a novel rapprochement between the two.
This bond between opposing theories provided the means to design a computational template (the AXEL System) by realising syntax and semantics at software levels. An instance of the AXEL system was created using a Design Research approach. Planned iterations were involved in the development to improve artefact performance. Such iterations boosted performance-improving, which accounted for the degree of association of meanings in three-noun compounds.
This dissertation delivered three major contributions on the brink of a so-called turning point in Computational Linguistics (CL). First, the AXEL system was used to disclose hidden lexical patterns on ambiguity. These patterns are difficult, if not impossible, to be identified without automatic techniques. This research claimed that these patterns can assist audiences of linguists to review lexical knowledge on a software-based viewpoint.
Following linguistic awareness, the second result advocated for the adoption of improved resources by decreasing electronic space of Sense Enumerative Lexicons (SELs). The AXEL system deployed the generation of “at the moment of use” interpretations, optimising the way the space is needed for lexical storage.
Finally, this research introduced a subsystem of metrics to characterise an ambiguous degree of association of three-noun compounds enabling ranking methods. Weighing methods delivered mechanisms of classification of meanings towards Word Sense Disambiguation (WSD). Overall these results attempted to tackle difficulties in understanding studies of Lexical Semantics via software tools
Typological parameters of genericity
Different languages employ different morphosyntactic devices for expressing genericity. And, of course, they also make use of different morphosyntactic and semantic or pragmatic cues which may contribute to the interpretation of a sentence as generic rather than episodic. [...] We will advance the strong hypo thesis that it is a fundamental property of lexical elements in natural language that they are neutral with respect to different modes of reference or non-reference. That is, we reject the idea that a certain use of a lexical element, e.g. a use which allows reference to particular spatio-temporally bounded objects in the world, should be linguistically prior to all other possible uses, e.g. to generic and non-specific uses. From this it follows that we do not consider generic uses as derived from non-generic uses as it is occasionally assumed in the literature. Rather, we regard these two possibilities of use as equivalent alternative uses of lexical elements. The typological differences to be noted therefore concern the formal and semantic relationship of generic and non-generic uses to each other; they do not pertain to the question of whether lexical elements are predetermined for one of these two uses. Even supposing we found a language where generic uses are always zero-marked and identical to lexical sterns, we would still not assume that lexical elements in this language primarily have a generic use from which the non-generic uses are derived. (Incidentally, none of the languages examined, not even Vietnamese, meets this criterion.
Human-Level Performance on Word Analogy Questions by Latent Relational Analysis
This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus
Similarity of Semantic Relations
There are at least two kinds of similarity. Relational similarity is
correspondence between relations, in contrast with attributional similarity,
which is correspondence between attributes. When two words have a high
degree of attributional similarity, we call them synonyms. When two pairs
of words have a high degree of relational similarity, we say that their
relations are analogous. For example, the word pair mason:stone is analogous
to the pair carpenter:wood. This paper introduces Latent Relational Analysis (LRA),
a method for measuring relational similarity. LRA has potential applications in many
areas, including information extraction, word sense disambiguation,
and information retrieval. Recently the Vector Space Model (VSM) of information
retrieval has been adapted to measuring relational similarity,
achieving a score of 47% on a collection of 374 college-level multiple-choice
word analogy questions. In the VSM approach, the relation between a pair of words is
characterized by a vector of frequencies of predefined patterns in a large corpus.
LRA extends the VSM approach in three ways: (1) the patterns are derived automatically
from the corpus, (2) the Singular Value Decomposition (SVD) is used to smooth the frequency
data, and (3) automatically generated synonyms are used to explore variations of the
word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the
average human score of 57%. On the related problem of classifying semantic relations, LRA
achieves similar gains over the VSM
- …