Suggesting valid pharmacogenes by mining linked data

Abstract

International audienceA standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, those have used only molecular networks or the biomedical literature. Here we propose a novel method that consumes an eclectic set of linked data sources to help validating uncertain drug–gene relationships. One of the advantages relies on that linked data are implemented in a standard framework that facilitates the joint use of various sources, making easy the consideration of features of various origins. Consequently, we propose an initial selection of linked data sources relevant to pharmacogenomics. We formatted these data to train a random forest algorithm , producing a model that enables classifying drug–gene pairs as related or not, thus confirming the validity of candidate pharmacogenes. Our model achieve the performance of F-measure=0.92, on a 100 folds cross-validation. A list of top candidates is provided and their obtention is discussed

    Similar works