Search CORE

1,932 research outputs found

Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text

Author: Galstyan Aram
Garg Sahil
Hermjakob Ulf
Marcu Daniel
Publication venue
Publication date: 04/12/2015
Field of study

We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.Comment: Appearing in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Measuring Semantic Similarity: Representations and Methods

Author: Lintean Mihai Cosmin
Publication venue: University of Memphis Digital Commons
Publication date: 25/07/2011
Field of study

This dissertation investigates and proposes ways to quantify and measure semantic similarity between texts. The general approach is to rely on linguistic information at various levels, including lexical, lexico-semantic, and syntactic. The approach starts by mapping texts onto structured representations that include lexical, lexico-semantic, and syntactic information. The representation is then used as input to methods designed to measure the semantic similarity between texts based on the available linguistic information.While world knowledge is needed to properly assess semantic similarity of texts, in our approach world knowledge is not used, which is a weakness of it.We limit ourselves to answering the question of how successfully one can measure the semantic similarity of texts using just linguistic information.The lexical information in the original texts is retained by using the words in the corresponding representations of the texts. Syntactic information is encoded using dependency relations trees, which represent explicitly the syntactic relations between words. Word-level semantic information is relatively encoded through the use of semantic similarity measures like WordNet Similarity or explicitly encoded using vectorial representations such as Latent Semantic Analysis (LSA). Several methods are being studied to compare the representations, ranging from simple lexical overlap, to more complex methods such as comparing semantic representations in vector spaces as well as syntactic structures. Furthermore, a few powerful kernel models are proposed to use in combination with Support Vector Machine (SVM) classifiers for the case in which the semantic similarity problem is modeled as a classification task

University of Memphis Digital Commons

Word Combination Kernel for Text Classification with Support Vector Machines

Author: Hu Xiaohui
Zhang Lujiang
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 20/01/2014
Field of study

In this paper we propose a novel kernel for text categorization. This kernel is an inner product defined in the feature space generated by all word combinations of specified length. A word combination is a collection of unique words co-occurring in the same sentence. The word combination of length k is weighted by the k rm th root of the product of the inverse document frequencies (IDF) of its words. By discarding word order, the word combination features are more compatible with the flexibility of natural language and the feature dimensions of documents can be reduced significantly to improve the sparseness of feature representations. By restricting the words to the same sentence and considering multi-word combinations, the word combination features can capture similarity at a more specific level than single words. A computationally simple and efficient algorithm was proposed to calculate this kernel. We conducted a series of experiments on the Reuters-21578 and 20 Newsgroups datasets. This kernel achieves better performance than the word kernel and word-sequence kernel. We also evaluated the computing efficiency of this kernel and observed the impact of the word combination length on performance

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)