Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

D.D. Lewis; E.M. Marcotte; J.D. Kim; K. Lund; L. Azzopardi; M. Girolami; M.N. Jones; M.N. Jones; R. Bunescu; S. Padó; S. Pyysalo; S. Rogers; T. Joachims; T.K. Landauer; Z. Minier

research

Semi-supervised prediction of protein interaction sentences exploiting semantically encoded metrics

Authors: D.D. Lewis
E.M. Marcotte
J.D. Kim
K. Lund
L. Azzopardi
M. Girolami
M.N. Jones
M.N. Jones
R. Bunescu
S. Padó
S. Pyysalo
S. Rogers
T. Joachims
T.K. Landauer
Z. Minier
Publication date: 1 January 2009
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Protein-protein interaction (PPI) identification is an integral component of many biomedical research and database curation tools. Automation of this task through classification is one of the key goals of text mining (TM). However, labelled PPI corpora required to train classifiers are generally small. In order to overcome this sparsity in the training data, we propose a novel method of integrating corpora that do not contain relevance judgements. Our approach uses a semantic language model to gather word similarity from a large unlabelled corpus. This additional information is integrated into the sentence classification process using kernel transformations and has a re-weighting effect on the training features that leads to an 8% improvement in F-score over the baseline results. Furthermore, we discover that some words which are generally considered indicative of interactions are actually neutralised by this process

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

Last time updated on 23/03/2019

Enlighten: Publications

oai:eprints.gla.ac.uk:6454

Last time updated on 09/04/2020

Enlighten

oai:eprints.gla.ac.uk:6454

Last time updated on 03/04/2012