15 research outputs found
Cancer driver mutation prediction through Bayesian integration of multi-omic data
<div><p>Identification of cancer driver mutations is critical for advancing cancer research and personalized medicine. Due to inter-tumor genetic heterogeneity, many driver mutations occur at low frequencies, which make it challenging to distinguish them from passenger mutations. Here, we show that a novel Bayesian hierarchical modeling approach, named rDriver can achieve enhanced prediction accuracy by identifying mutations that not only have high functional impact scores but also are associated with systemic variation in gene expression levels. In examining 3,080 tumor samples from 8 cancer types in The Cancer Genome Atlas, rDriver predicted 1,389 driver mutations. Compared with existing tools, rDriver identified more low frequency mutations associated with lineage specific functional properties, timing of occurrence and patient survival. Evaluation of rDriver predictions using engineered cell-line models resulted in a positive predictive value of 0.94 in <i>PIK3CA</i> genes. Our study highlights the importance of integrating multi-omic data in predicting cancer driver mutations and provides a statistically rigorous solution for cancer target discovery and development.</p></div
Integrative prognostic scores in two cancer types.
<p>(A) Kaplan-Meier plot showing that IPS can significantly separate the tumors in terms of overall survival in BRCA, log-rank test p = 0.041. (B) BRCA samples are classified into 2 clusters based on their IPS values. (C) Kaplan-Meier plot showing that IPS can significantly separate the tumors in terms of overall survival in GBM, log-rank test p = 0.0044. (D) GBM samples are classified into 2 clusters based on their IPS values.</p
Functional annotations of rDriver predictions in 8 cancer types.
<p>(A) The predictedat drivers present in cgc and at least 2 cancer types. (B)The predicted top 10 ranked drivers in novel genes. The color represents different rank range.</p
Outline of rDriver.
<p>We define a Bayesian hierarchical model that predict mRNA expression levels from mutations and related functional genomic annotations. The regulatory features, such as the evolutionary conservation or the physiochemical properties of a mutation are integrated into the model by a weight prior vector w. The program proceeds to learn these parameters by iterating the following three steps: (i) rDriver takes as input the regulatory priors for each mutation, and constructs a set of regularized penalty δ for the mutation. In the first iteration, the regulatory priors are assumed to be uniform. (ii) rDriver takes as input the mutations X and their specific regularized penalty to learns the regression coefficients β, representing the predictability of a mRNA expression level from a mutation. (iii) rDriver takes as input the output of the previous steps and updates the regulatory prior parameter of each mutation through minimization of the objective function. The final converged solution will result in a mutation and expression association matrix. The likelihood (score) that a mutation is a driver is computed based on the number of non-zero regression coefficients between the mutation and the set of mRNAs, followed by permutation tests.</p
Clone and subtype analysis.
<p>(A) The proportion of clonal/subclonal in driver and passenger group for each cancer types. Significance from Fisher’s exact test is indicated. Exact p values are as follows; BLCA, p = 4.53e-03; HNSC, p = 4.70e-02; SKCM, p = 1.46e-05; GBM, p = 5.45e-03; BRCA, p = 2.89e-03, LUSC, p = 6.686e-02; KIRC, p = 2.62e-03, LUAD, p = 1.372e-02 by Chi-squared test. (B) The fraction of cancer cells mutated for <i>PIK3CA</i> E545K and <i>BRAF</i> V600E in effected cancers. (C) The distribution of mutation subtypes in the driver and the passenger groups predicted by rDriver. Stars indicate significance of enrichment of stop-gained or splice related variants in driver group. (**<0.01, *p<0.05; Fisher’s exact test) (D) Altered <i>TP53</i> gene expression associated with different types of mutations in BRCA. (**<0.01, *p<0.05; t-test).</p
rDriver output of BRCA and its comparison with other methods.
<p>(A) the frequency (blue) and GERP score (red) distribution across the 528 mutations in TCGA BRCA data (the SIFT score is not shown due to missing value) (B) the association matrix between the mRNAs and the mutations, with brown dots representing non-zero association coefficients. (C) the total number of non-zero values in the association matrix column-wise, representing the likelihood of driver mutations. A few known driver mutation hotspots in <i>PIK3CA</i> and <i>TP53</i> are labelled (red text). (D) Receiver operator characteristic (ROC) curves comparing the sets of driver genes predicted by various programs against a set of 17 known cancer driver genes in the Cancer Gene Census. (E) ROC curves comparing the sets of mutations predicted by rDriver, frequency, GERP, Condel, SIFT and PolyPhen against a set of 42 known driver mutations.</p
Demographic and tumor characteristics of CRC patients according to SMAD4 mutation status.
<p>Demographic and tumor characteristics of CRC patients according to SMAD4 mutation status.</p
The prevalence and spectrum of SMAD4 mutations in the MD Anderson study patients and TCGA data.
<p>(A) The prevalence and spectrum of SMAD4 mutations in the study patients who underwent full-length sequencing (n = 49). (B) The prevalence and spectrum of SMAD4 mutations in TCGA data (n = 220 patients).</p
Comparison of the incidences of SMAD4 mutations across different molecular subtypes.
<p>Comparison of the incidences of SMAD4 mutations across different molecular subtypes.</p
Univariate and multivariate Cox regression analyses of OS in metastatic CRC patients (n = 600).
<p>Univariate and multivariate Cox regression analyses of OS in metastatic CRC patients (n = 600).</p