10 research outputs found
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
Identification of drug-target interactions (DTIs) plays a key role in drug
discovery. The high cost and labor-intensive nature of in vitro and in vivo
experiments have highlighted the importance of in silico-based DTI prediction
approaches. In several computational models, conventional protein descriptors
are shown to be not informative enough to predict accurate DTIs. Thus, in this
study, we employ a convolutional neural network (CNN) on raw protein sequences
to capture local residue patterns participating in DTIs. With CNN on protein
sequences, our model performs better than previous protein descriptor-based
models. In addition, our model performs better than the previous deep learning
model for massive prediction of DTIs. By examining the pooled convolution
results, we found that our model can detect binding sites of proteins for DTIs.
In conclusion, our prediction model for detecting local residue patterns of
target proteins successfully enriches the protein features of a raw protein
sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure
SELF-BLM: Prediction of drug-target interactions via self-training SVM.
Predicting drug-target interactions is important for the development of novel drugs and the repositioning of drugs. To predict such interactions, there are a number of methods based on drug and target protein similarity. Although these methods, such as the bipartite local model (BLM), show promise, they often categorize unknown interactions as negative interaction. Therefore, these methods are not ideal for finding potential drug-target interactions that have not yet been validated as positive interactions. Thus, here we propose a method that integrates machine learning techniques, such as self-training support vector machine (SVM) and BLM, to develop a self-training bipartite local model (SELF-BLM) that facilitates the identification of potential interactions. The method first categorizes unlabeled interactions and negative interactions among unknown interactions using a clustering method. Then, using the BLM method and self-training SVM, the unlabeled interactions are self-trained and final local classification models are constructed. When applied to four classes of proteins that include enzymes, G-protein coupled receptors (GPCRs), ion channels, and nuclear receptors, SELF-BLM showed the best performance for predicting not only known interactions but also potential interactions in three protein classes compare to other related studies. The implemented software and supporting data are available at https://github.com/GIST-CSBL/SELF-BLM
The AUC and AUPR values of the five methods for the four types of proteins in each validation set (previous and updated dataset).
<p>The AUC and AUPR values of the five methods for the four types of proteins in each validation set (previous and updated dataset).</p
Rankings of AUPR trends by the different methods according to the updated dataset.
<p>In each panel, y-axis shows the rank representation of the AUPR value. A) the ranking in type of enzymes, B) the ranking in type of ion channels, C) the ranking in type of GPCRs, D) the ranking in type of nuclear receptor.</p
The potential AUPRs of the five methods for the four types of proteins.
<p>The potential AUPRs of the five methods for the four types of proteins.</p
An example of SELF-BLM predicting the targets of a drug.
<p>In the previous dataset, it was known that proteins (HTR2A and HTR2C) bind to a drug (Olanzapine), but it was not known that other proteins (HTR1B, HTR1D, and HTR1F) also bind to the drug. Thus, in BLM, HTR2A and HTR2C are labeled as positive, and HTR1B, HTR1D and HTR1F are labeled as negative. Because the protein (HTR1E) is more similar to negatively labeled proteins than to positively labeled proteins, the protein is predicted to be negative. However, in SELF-BLM, these proteins (HTR1B, HTR1D, and HTR1F) are unlabeled. Therefore, the protein (HTR1E) is predicted as positive. There was no information suggesting that the protein (HTR1E) binds to the drug (Olanzapine) in the previous data, but it was later revealed that the protein indeed binds to the drug.</p
The number of drugs, target proteins, interactions and updated interactions of each type.
<p>The number of drugs, target proteins, interactions and updated interactions of each type.</p
Overview of the proposed method.
<p><b>(A)</b> From known information, drug-target interactions are classified into positive and unknown interactions (matrix A). Using similarity scores of drugs (matrix <i>S</i><sup><i>d</i></sup>) and targets (matrix <i>S</i><sup><i>t</i></sup>), we performed k-medoids clustering. If any of the drugs in a cluster do not interact with the cluster of the target protein, we considered the drugs in the cluster as having a negative interaction with the protein. Finally, drug-target interactions are classified into positive, negative and unknown interactions (matrix <i>A</i><sub><i>n</i></sub>). Yellow rectangle: target protein, blue circle: drugs having positive interactions with the target protein, red circle: drugs having negative interactions with the target protein, gray circle: drugs having unknown interactions with the target protein. <b>(B)</b> A self-training SVM repeatedly trains the unlabeled data (unknown) as positive or negative. Finally, local classification models that can find potential interactions are constructed.</p
The number of potential interactions found by each method.
<p>X-axis represents the accumulated percentage of positively predicted interactions in each method, y-axis represents the number of correctly predicted potential interactions. A) The number of potential interactions according to type of enzyme. B) The number of potential interactions according to type of ion channel. C) The number of potential interactions according to type of GPCR. D) The number of potential interactions according to type of nuclear receptor.</p