26 research outputs found
Deep Learning Accurately Predicts Estrogen Receptor Status in Breast Cancer Metabolomics Data
Metabolomics
holds the promise as a new technology to diagnose
highly heterogeneous diseases. Conventionally, metabolomics data analysis
for diagnosis is done using various statistical and machine learning
based classification methods. However, it remains unknown if deep
neural network, a class of increasingly popular machine learning methods,
is suitable to classify metabolomics data. Here we use a cohort of
271 breast cancer tissues, 204 positive estrogen receptor (ER+), and
67 negative estrogen receptor (ER−) to test the accuracies
of feed-forward networks, a deep learning (DL) framework, as well
as six widely used machine learning models, namely random forest (RF),
support vector machines (SVM), recursive partitioning and regression
trees (RPART), linear discriminant analysis (LDA), prediction analysis
for microarrays (PAM), and generalized boosted models (GBM). DL framework
has the highest area under the curve (AUC) of 0.93 in classifying
ER+/ER– patients, compared to the other six machine learning
algorithms. Furthermore, the biological interpretation of the first
hidden layer reveals eight commonly enriched significant metabolomics
pathways (adjusted <i>P</i>-value <0.05) that cannot
be discovered by other machine learning methods. Among them, protein
digestion and absorption and ATP-binding cassette (ABC) transporters
pathways are also confirmed in integrated analysis between metabolomics
and gene expression data in these samples. In summary, deep learning
method shows advantages for metabolomics based breast cancer ER status
classification, with both the highest prediction accuracy (AUC = 0.93)
and better revelation of disease biology. We encourage the adoption
of feed-forward networks based deep learning method in the metabolomics
research community for classification
Additional file 1: Figure S1. of Prediction of anticancer molecules using hybrid model developed on molecules screened against NCI-60 cancer cell lines
Counts of Functional groups present in anticancer and non-anticancer molecules. Table S1. Shows frequency of occurrence of MCS in anticancer and non-anticancer compounds according to LibMCS module of Chemaxon. Structures were search using jcsearch module of Chemaxon with substructure search option. Table S2. The individual performance of best 126 selected fingerprints using MCC based approach. Table S3. Performance of hybrid method developed using 126 fingerprints on different sensitivity. (DOC 356 kb
The performance of motif-based model developed on main dataset.
<p>PCP; probability of correct prediction.</p
Sequence logos of (A) first ten residues of N-terminus and (B) last ten residues of C-terminus of toxic peptides, where size of residue is proportional to its propensity (main dataset).
<p>Sequence logos of (A) first ten residues of N-terminus and (B) last ten residues of C-terminus of toxic peptides, where size of residue is proportional to its propensity (main dataset).</p
The performance of quantitative matix based method on various datasets.
<p>MCC, Matthew’s correlation coefficient; AUC, area under the curve.</p
Schematic representation of ToxinPred webserver.
<p>Schematic representation of ToxinPred webserver.</p
Data_Sheet_1.DOC
<p>This paper describes in silico models developed using a wide range of peptide features for predicting antifungal peptides (AFPs). Our analyses indicate that certain types of residue (e.g., C, G, H, K, R, Y) are more abundant in AFPs. The positional residue preference analysis reveals the prominence of the particular type of residues (e.g., R, V, K) at N-terminus and a certain type of residues (e.g., C, H) at C-terminus. In this study, models have been developed for predicting AFPs using a wide range of peptide features (like residue composition, binary profile, terminal residues). The support vector machine based model developed using compositional features of peptides achieved maximum accuracy of 88.78% on the training dataset and 83.33% on independent or validation dataset. Our model developed using binary patterns of terminal residues of peptides achieved maximum accuracy of 84.88% on training and 84.64% on validation dataset. We benchmark models developed in this study and existing methods on a dataset containing compositionally similar antifungal and non-AFPs. It was observed that binary based model developed in this study preforms better than any model/method. In order to facilitate scientific community, we developed a mobile app, standalone and a user-friendly web server ‘Antifp’ (http://webs.iiitd.edu.in/raghava/antifp).</p
Maximum and minimum scoring residues at every position as observed in quantitative matrix (main dataset).
<p>Maximum and minimum scoring residues at every position as observed in quantitative matrix (main dataset).</p
Overall architecture of TumorHoPe database.
<p>Overall architecture of TumorHoPe database.</p