Search CORE

43 research outputs found

Discovering patterns in drug-protein interactions based on their fingerprints

Author: B Chen
DB Osteyee
DC Harris
J Klekota
KA De Jong
KA De Jong
Keith CC Chan
L Xue
LM Kauvar
M Gribskov
MA Johnson
MJ Keiser
MJ Keiser
MJ McGregor
RD Finn
RW DeSimone
SF Sousa
Uniprot Consortium
Weimin Luo
Y Yamanishi
Z Deng
Publication venue: BioMed Central
Publication date: 01/06/2012
Field of study

Abstract Background The discovering of interesting patterns in drug-protein interaction data at molecular level can reveal hidden relationship among drugs and proteins and can therefore be of paramount importance for such application as drug design. To discover such patterns, we propose here a computational approach to analyze the molecular data of drugs and proteins that are known to have interactions with each other. Specifically, we propose to use a data mining technique called <it>Drug-Protein Interaction Analysis </it>(<it>D-PIA</it>) to determine if there are any commonalities in the fingerprints of the substructures of interacting drug and protein molecules and if so, whether or not any patterns can be generalized from them. Method Given a database of drug-protein interactions, <it>D-PIA </it>performs its tasks in several steps. First, for each drug in the database, the fingerprints of its molecular substructures are first obtained. Second, for each protein in the database, the fingerprints of its protein domains are obtained. Third, based on known interactions between drugs and proteins, an interdependency measure between the fingerprint of each drug substructure and protein domain is then computed. Fourth, based on the interdependency measure, drug substructures and protein domains that are significantly interdependent are identified. Fifth, the existence of interaction relationship between a previously unknown drug-protein pairs is then predicted based on their constituent substructures that are significantly interdependent. Results To evaluate the effectiveness of <it>D-PIA</it>, we have tested it with real drug-protein interaction data. <it>D-PIA </it>has been tested with real drug-protein interaction data including enzymes, ion channels, and protein-coupled receptors. Experimental results show that there are indeed patterns that one can discover in the interdependency relationship between drug substructures and protein domains of interacting drugs and proteins. Based on these relationships, a testing set of drug-protein data are used to see if <it>D-PIA </it>can correctly predict the existence of interaction between drug-protein pairs. The results show that the prediction accuracy can be very high. An AUC score of a ROC plot could reach as high as 75% which shows the effectiveness of this classifier. Conclusions <it>D-PIA </it>has the advantage that it is able to perform its tasks effectively based on the fingerprints of drug and protein molecules without requiring any 3D information about their structures and <it>D-PIA </it>is therefore very fast to compute. <it>D-PIA </it>has been tested with real drug-protein interaction data and experimental results show that it can be very useful for predicting previously unknown drug-protein as well as protein-ligand interactions. It can also be used to tackle problems such as ligand specificity which is related directly and indirectly to drug design and discovery.</p

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

Directory of Open Access Journals

PolyU Institutional Repository

PubMed Central

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

Author: A Statnikov
AB Okey
AF Fliri
AF Fliri
AM Molinaro
C Helma
C Sima
CP Austin
D Krewski
DJ Dix
E Walum
EE Ntzani
Fathi Elloumi
GM Williams
GV Paolini
H Almuallim
H Toivonen
H Wang
Imran Shah
J Inglese
J Klekota
J Lamb
J Zhang
JPVanden Heuvel
JS Melnick
K Tietjen
LB Moore
LH Li
M Bredel
M McMillian
MT Martin
N Ancona
N Bhogal
N Japkowicz
P Baldi
P Pudil
PJ O'Brien
R Benigni
R Burbridge
R Kikkawa
R Kohavi
R Woodrow Setzer
Richard Judson
RL Strausberg
SC Smith
SG Baker
U Scherf
Y Sun
Z Lepp
Zhen Li
Publication venue: BioMed Central
Publication date: 01/05/2008
Field of study

Abstract Background Bioactivity profiling using high-throughput <it>in vitro </it>assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex <it>in vitro/in vivo </it>datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. Results The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated <it>in vitro </it>assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository