40 research outputs found

    Measuring Semantic Similarity by Latent Relational Analysis

    Get PDF
    This paper introduces Latent Relational Analysis (LRA), a method for measuring semantic similarity. LRA measures similarity in the semantic relations between two pairs of words. When two pairs have a high degree of relational similarity, they are analogous. For example, the pair cat:meow is analogous to the pair dog:bark. There is evidence from cognitive science that relational similarity is fundamental to many cognitive and linguistic tasks (e.g., analogical reasoning). In the Vector Space Model (VSM) approach to measuring relational similarity, the similarity between two pairs is calculated by the cosine of the angle between the vectors that represent the two pairs. The elements in the vectors are based on the frequencies of manually constructed patterns in a large corpus. LRA extends the VSM approach in three ways: (1) patterns are derived automatically from the corpus, (2) Singular Value Decomposition is used to smooth the frequency data, and (3) synonyms are used to reformulate word pairs. This paper describes the LRA algorithm and experimentally compares LRA to VSM on two tasks, answering college-level multiple-choice word analogy questions and classifying semantic relations in noun-modifier expressions. LRA achieves state-of-the-art results, reaching human-level performance on the analogy questions and significantly exceeding VSM performance on both tasks

    Deductive and Analogical Reasoning on a Semantically Embedded Knowledge Graph

    Full text link
    Representing knowledge as high-dimensional vectors in a continuous semantic vector space can help overcome the brittleness and incompleteness of traditional knowledge bases. We present a method for performing deductive reasoning directly in such a vector space, combining analogy, association, and deduction in a straightforward way at each step in a chain of reasoning, drawing on knowledge from diverse sources and ontologies.Comment: AGI 201

    Embedding Semantic Relations into Word Representations

    Get PDF
    Learning representations for semantic relations is important for various tasks such as analogy detection, relational search, and relation classification. Although there have been several proposals for learning representations for individual words, learning word representations that explicitly capture the semantic relations between words remains under developed. We propose an unsupervised method for learning vector representations for words such that the learnt representations are sensitive to the semantic relations that exist between two words. First, we extract lexical patterns from the co-occurrence contexts of two words in a corpus to represent the semantic relations that exist between those two words. Second, we represent a lexical pattern as the weighted sum of the representations of the words that co-occur with that lexical pattern. Third, we train a binary classifier to detect relationally similar vs. non-similar lexical pattern pairs. The proposed method is unsupervised in the sense that the lexical pattern pairs we use as train data are automatically sampled from a corpus, without requiring any manual intervention. Our proposed method statistically significantly outperforms the current state-of-the-art word representations on three benchmark datasets for proportional analogy detection, demonstrating its ability to accurately capture the semantic relations among words.Comment: International Joint Conferences in AI (IJCAI) 201

    Efficient Estimation of Word Representations in Vector Space

    Full text link
    We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities

    Détection et classification non supervisées de relations sémantiques dans des articles scientifiques

    No full text
    International audienceDans cet article, nous abordons une tâche encore peu explorée, consistant à extraire automatiquement l'état de l'art d'un domaine scientifique à partir de l'analyse d'articles de ce domaine. Nous la ramenons à deux sous-tâches élémentaires : l'identification de concepts et la reconnaissance de relations entre ces concepts. Une extraction terminologique permet d'identifier les concepts candidats, qui sont ensuite alignés à des ressources externes. Dans un deuxième temps, nous cherchons à reconnaître et classifier automatiquement les relations sémantiques entre concepts de manière non-supervisée, en nous appuyant sur différentes techniques de clustering et de biclustering. Nous mettons en oeuvre ces deux étapes dans un corpus extrait de l'archive de l'ACL Anthology. Une analyse manuelle nous a permis de proposer une typologie des relations sémantiques, et de classifier un échantillon d'instances de relations. Les premières évaluations suggèrent l'intérêt du biclustering pour détecter de nouveaux types de relations dans le corpus. ABSTRACT Unsupervised Classification of Semantic Relations in Scientific Papers In this article, we tackle the yet unexplored task of automatically building the "state of the art" of a scientific domain from a corpus of research papers. This task is defined as a sequence of two basic steps : finding concepts and recognizing the relations between them. First, candidate concepts are identified using terminology extraction, and subsequently linked to external resources. Second, semantic relations between entities are categorized with different clustring and biclustering algorithms. Experiences were carried out on the ACL Anthology Corpus. Results are evaluated against a hand-crafted typology of semantic relations and manually categorized examples. The first results indicate that biclustering techniques may indeed be useful for detecting new types of relations. MOTS-CLÉS : analyse de la littérature scientifique, extraction de relations, clustering, biclustering
    corecore