Search CORE

58 research outputs found

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Author: Airola A
Bjorne J
Ginter F
Pahikkala T
Pyysalo S
Salakoski T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/10/2022
Field of study

Background Automated extraction of protein-protein interactions (PPI) is an important and widely studied task in biomedical text mining. We propose a graph kernel based approach for this task. In contrast to earlier approaches to PPI extraction, the introduced all-paths graph kernel has the capability to make use of full, general dependency graphs representing the sentence structure. Results We evaluate the proposed method on five publicly available PPI corpora, providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. We additionally perform a detailed evaluation of the effects of training and testing on different resources, providing insight into the challenges involved in applying a system beyond the data it was trained on. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, with 56.4 F-score and 84.8 AUC on the AImed corpus. Conclusion We show that the graph kernel approach performs on state-of-the-art level in PPI extraction, and note the possible extension to the task of extracting complex interactions. Cross-corpus results provide further insight into how the learning generalizes beyond individual corpora. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources. Recommendations for avoiding these pitfalls are provided. </div

UTUPub

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Author: A Airola
A Yakushiji
AB Clegg
Antti Airola
AP Bradley
C Giuliano
C Nédellec
CD Meyer
D Zelenko
Filip Ginter
J Björne
J Ding
J Heimonen
JA Hanley
JAK Suykens
Jari Björne
JD Kim
JG Caporaso
K Fundel
KB Cohen
L Hirschman
L Hunter
M Lease
M Miwa
MC de Marneffe
P Zweigenbaum
R Bunescu
R Bunescu
R Bunescu
R Rifkin
R Sætre
S Pyysalo
S Pyysalo
S Pyysalo
S Van Landeghem
Sampo Pyysalo
T Gärtner
T Mitsumori
T Pahikkala
T Pahikkala
Tapio Pahikkala
Tapio Salakoski
Y Miyao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A fast and effective dependency graph kernel for PPI relation extraction

Author: A Airola
C Giuliano
Domonkos Tikk
Peter Palaga
T Kuboyama
Ulf Leser
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

Author: D.D. Lewis
E.M. Marcotte
J.D. Kim
K. Lund
L. Azzopardi
M. Girolami
M.N. Jones
M.N. Jones
R. Bunescu
S. Padó
S. Pyysalo
S. Rogers
T. Joachims
T.K. Landauer
Z. Minier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process

Classification of protein interaction sentences via gaussian processes

Author: A. Aizerman
A.M. Cohen
C.D. Manning
C.D. Manning
C.E. Rasmussen
C.H. Ding
D.D. Lewis
E.M. Marcotte
H. Chen
J. Huang
J.C. Platt
J.D. Kim
J.H. Albert
K. Crammer
K. Sugiyama
K.M.A. Chai
M. Girolami
M. Girolami
N. Lama
N. Lawrence
R. Bunescu
S. Rogers
S.S. Keerthi
Silva
T. Joachims
V. Vapnik
W. Chu
W. Chu
Y. Hao
Y. Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption

Biomedical Relationship Extraction from Literature Based on Bio-Semantic Token Subsequences

Author: Katukuri Jayasimha R.
Raghavan Vijay V.
Xie Ying
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/01/2010
Field of study

Relationship Extraction (RE) from biomedical literature is an important and challenging problem in both text mining and bioinformatics. Although various approaches have been proposed to extract protein?protein interaction types, their accuracy rates leave a large room for further exploring. In this paper, two supervised learning algorithms based on newly defined bio-semantic token subsequence are proposed for multi-class biomedical relationship classification. The first approach calculates a bio-semantic token subsequence kernel , whereas the second one explicitly extracts weighted features from bio-semantic token subsequences. The two proposed approaches outperform several alternatives reported in literature on multi-class protein?protein interaction classification

DigitalCommons@Kennesaw State University

Extracting Biomolecular Interactions Using Semantic Parsing of Biomedical Text

Author: Galstyan Aram
Garg Sahil
Hermjakob Ulf
Marcu Daniel
Publication venue
Publication date: 04/12/2015
Field of study

We advance the state of the art in biomolecular interaction extraction with three contributions: (i) We show that deep, Abstract Meaning Representations (AMR) significantly improve the accuracy of a biomolecular interaction extraction system when compared to a baseline that relies solely on surface- and syntax-based features; (ii) In contrast with previous approaches that infer relations on a sentence-by-sentence basis, we expand our framework to enable consistent predictions over sets of sentences (documents); (iii) We further modify and expand a graph kernel learning framework to enable concurrent exploitation of automatically induced AMR (semantic) and dependency structure (syntactic) representations. Our experiments show that our approach yields interaction extraction systems that are more robust in environments where there is a significant mismatch between training and test conditions.Comment: Appearing in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Discriminative and informative features for biomolecular text mining with ensemble feature selection

Author: Reverter
S. Van Landeghem
T. Abeel
Y. Saeys
Y. Van de Peer
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results

CiteSeerX

Crossref

Ghent University Academic Bibliography

PubMed Central