Extracting protein-protein interactions from text using rich feature vectors and feature selection

De Baets, Bernard; Saeys, Yvan; Van de Peer, Yves; Van Landeghem, Sofie

research

Extracting protein-protein interactions from text using rich feature vectors and feature selection

Authors: Bernard De Baets
Yvan Saeys
Yves Van de Peer
Sofie Van Landeghem
Publication date: 1 January 2008
Publisher: Turku Centre for Computer Sciences (TUCS)

Abstract

Because of the intrinsic complexity of natural language, automatically extracting accurate information from text remains a challenge. We have applied rich featurevectors derived from dependency graphs to predict protein-protein interactions using machine learning techniques. We present the first extensive analysis of applyingfeature selection in this domain, and show that it can produce more cost-effective models. For the first time, our technique was also evaluated on several large-scalecross-dataset experiments, which offers a more realistic view on model performance. During benchmarking, we encountered several fundamental problems hindering comparability with other methods. We present a set of practical guidelines to set up ameaningful evaluation. Finally, we have analysed the feature sets from our experiments before and after feature selection, and evaluated the contribution of both lexical and syntacticinformation to our method. The gained insight will be useful to develop better performing methods in this domain

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Ghent University Academic Bibliography

oai:archive.ugent.be:538895

Last time updated on 12/11/2016