Search CORE

10 research outputs found

PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies

Author: Dong-Sheng Cao (399743)
Gui-Shan Tan (1880107)
Jun Yan (28467)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Publication venue
Publication date
Field of study

The rapidly increasing amount of publicly available data in biology and chemistry enables researchers to revisit interaction problems by systematic integration and analysis of heterogeneous data. Herein, we developed a comprehensive python package to emphasize the integration of chemoinformatics and bioinformatics into a molecular informatics platform for drug discovery. PyDPI (drug–protein interaction with Python) is a powerful python toolkit for computing commonly used structural and physicochemical features of proteins and peptides from amino acid sequences, molecular descriptors of drug molecules from their topology, and protein–protein interaction and protein–ligand interaction descriptors. It computes 6 protein feature groups composed of 14 features that include 52 descriptor types and 9890 descriptors, 9 drug feature groups composed of 13 descriptor types that include 615 descriptors. In addition, it provides seven types of molecular fingerprint systems for drug molecules, including topological fingerprints, electro-topological state (E-state) fingerprints, MACCS keys, FP4 keys, atom pair fingerprints, topological torsion fingerprints, and Morgan/circular fingerprints. By combining different types of descriptors from drugs and proteins in different ways, interaction descriptors representing protein–protein or drug–protein interactions could be conveniently generated. These computed descriptors can be widely used in various fields relevant to chemoinformatics, bioinformatics, and chemogenomics. PyDPI is freely available via https://sourceforge.net/projects/pydpicao/

FigShare

Prediction of Peptide Fragment Ion Mass Spectra by Data Mining Techniques

Author: Daniel K. W. Mok (1600981)
Hong-mei Lu (1770070)
Lun-zhao Yi (1770073)
Min He (44052)
Nai-ping Dong (1770076)
Qing-song Xu (1770079)
Wei Fan (109074)
Yi-Zeng Liang (399744)
Publication venue
Publication date
Field of study

Accurate prediction of peptide fragment ion mass spectra is one of the critical factors to guarantee confident peptide identification by protein sequence database search in bottom-up proteomics. In an attempt to accurately and comprehensively predict this type of mass spectra, a framework named MS2PBPI is proposed. MS2PBPI first extracts fragment ions from large-scale MS/MS spectra data sets according to the peptide fragmentation pathways and uses binary trees to divide the obtained bulky data into tens to more than 1000 regions. For each adequate region, stochastic gradient boosting tree regression model is constructed. By constructing hundreds of these models, MS2PBPI is able to predict MS/MS spectra for unmodified and modified peptides with reasonable accuracy. Moreover, high consistency between predicted and experimental MS/MS spectra derived from different ion trap instruments with low and high resolving power is achieved. MS2PBPI outperforms existing algorithms MassAnalyzer and PeptideART

FigShare

The predictive probability plot of screening all cross-linking drug-target pairs. The size of predictive probability gradually varies from green to red.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

The predictive probability plot of screening all cross-linking drug-target pairs. The size of predictive probability gradually varies from green to red.</p

FigShare

ROCs and precision-recall curves for Naïve Bayes (green) and random forest (red) with full and selected features.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

(A) ROCs (B) precision-recall curves.</p

FigShare

Prediction results of five-fold cross validation using different models.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

TP: true positives; FN: false negatives; TN: true negatives; FP: false positives; Sen: sensitivity; Spe: specificity; Acc: accuracy.</p

FigShare

ROCs and precision-recall curves with different Ki thresholds using RF.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

(A) ROCs (B) precision-recall curves. The auPRCs drop with the decreasing of Ki thresholds. However, the varying trend of auROCs is consistent with that of auPRCs.</p

FigShare

Drug-target interaction network using drug-target pairs with prediction probability above 0.99.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

Drugs and targets are presented by red circle and blue triangle, respectively. Drug-target interactions are represented by the edges connecting related drugs and targets.</p

FigShare

The plot of Ki versus prediction probability on 5-fold cross validation.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

non-interaction: red and interaction: green. Linear relationship between Ki and prediction probability could be observed with correlation coefficient of 0.65.</p

FigShare

Outline of our methodology.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

(A) Interaction features are calculated by combing the fingerprint descriptors from drugs and the CTD and amino acid composition descriptors from protein sequences. These feature vectors are used to find the optimal RF parameters which most accurately separate the positive and negative training sets. The independent validation sets are used for further validation for the RF model. (B) Once the RF model is constructed, we can predict new unknown drug-target associations or screen all cross-linking associations.</p

FigShare

Prediction statistics on different false discovery rates.

Author: Dong-Sheng Cao (399743)
Guang-Hua Zhou (399746)
Liu-Xia Zhang (399747)
Min He (44052)
Qian-Nan Hu (291453)
Qing-Song Xu (399745)
Shao Liu (399749)
Yi-Zeng Liang (399744)
Zhe Deng (291456)
Zi-xin Deng (399748)
Publication venue
Publication date
Field of study

FDR: false discovery rate, Number: Number of drug-target pairs predicted as interactions, Ratio: the ratio between drug target pairs predicted as interactions and all screening pairs on specific FDR.</p

FigShare

PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies

Prediction of Peptide Fragment Ion Mass Spectra by Data Mining Techniques

The predictive probability plot of screening all cross-linking drug-target pairs. The size of predictive probability gradually varies from green to red.

ROCs and precision-recall curves for Naïve Bayes (green) and random forest (red) with full and selected features.

Prediction results of five-fold cross validation using different models.

ROCs and precision-recall curves with different K<sub>i</sub> thresholds using RF.

Drug-target interaction network using drug-target pairs with prediction probability above 0.99.

The plot of K<sub>i</sub> versus prediction probability on 5-fold cross validation.

Outline of our methodology.

Prediction statistics on different false discovery rates.