41 research outputs found

    Bayesian Graphical Regression

    No full text
    <p>We consider the problem of modeling conditional independence structures in heterogenous data in the presence of additional subject-level covariates—termed graphical regression. We propose a novel specification of a conditional (in)dependence function of covariates—which allows the structure of a directed graph to vary flexibly with the covariates; imposes sparsity in both edge and covariate selection; produces both subject-specific and predictive graphs; and is computationally tractable. We provide theoretical justifications of our modeling endeavor, in terms of graphical model selection consistency. We demonstrate the performance of our method through rigorous simulation studies. We illustrate our approach in a cancer genomics-based precision medicine paradigm, where-in we explore gene regulatory networks in multiple myeloma taking prognostic clinical factors into account to obtain both population-level and subject-level gene regulatory networks. Supplementary materials for this article are available online.</p

    A Two-Sample Test for Equality of Means in High Dimension

    No full text
    <div><p>We develop a test statistic for testing the equality of two population mean vectors in the “large-<i>p</i>-small-<i>n</i>” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling <i>T</i><sup>2</sup> test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the <i>p</i> components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large <i>p</i>. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.</p></div

    Bayesian Hierarchical Varying-Sparsity Regression Models with Application to Cancer Proteogenomics

    No full text
    <p>Identifying patient-specific prognostic biomarkers is of critical importance in developing personalized treatment for clinically and molecularly heterogeneous diseases such as cancer. In this article, we propose a novel regression framework, <i>Bayesian hierarchical varying-sparsity regression</i> (BEHAVIOR) models to select clinically relevant disease markers by integrating proteogenomic (proteomic+genomic) and clinical data. Our methods allow flexible modeling of protein–gene relationships as well as induces sparsity in both protein–gene and protein–survival relationships, to select genomically driven prognostic protein markers at the patient-level. Simulation studies demonstrate the superior performance of BEHAVIOR against competing method in terms of both protein marker selection and survival prediction. We apply BEHAVIOR to The Cancer Genome Atlas (TCGA) proteogenomic pan-cancer data and find several interesting prognostic proteins and pathways that are shared across multiple cancers and some that exclusively pertain to specific cancers. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available online.</p

    Bayesian variable selection with graphical structure learning: Applications in integrative genomics

    No full text
    <div><p>Significant advances in biotechnology have allowed for simultaneous measurement of molecular data across multiple genomic, epigenomic and transcriptomic levels from a single tumor/patient sample. This has motivated systematic data-driven approaches to integrate multi-dimensional structured datasets, since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel multi-scale Bayesian approach that combines integrative graphical structure learning from multiple sources of data with a variable selection framework—to determine the key genomic drivers of cancer progression. The integrative structure learning is first accomplished through novel joint graphical models for heterogeneous (mixed scale) data, allowing for flexible and interpretable incorporation of prior existing knowledge. This subsequently informs a variable selection step to identify groups of co-ordinated molecular features within and across platforms associated with clinical outcomes of cancer progression, while according appropriate adjustments for multicollinearity and multiplicities. We evaluate our methods through rigorous simulations to establish superiority over existing methods that do not take the network and/or prior information into account. Our methods are motivated by and applied to a glioblastoma multiforme (GBM) dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, copy number and methylation data. We find a high concordance between our selected prognostic gene network modules with known associations with GBM. In addition, our model discovers several novel cross-platform network interactions (both cis and trans acting) between gene expression, copy number variation associated gene dosing and epigenetic regulation through promoter methylation, some with known implications in the etiology of GBM. Our framework provides a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers.</p></div

    Simulations for Case I(a)-I(b), training sample size = 100, test sample size = 100.

    No full text
    <p>BVS-SL(<i>Îș</i>) represents the Bayes variable selection with belief parameter <i>Îș</i> for all edges. Pencred, SSVS, Lasso, EL, SCAD, SSL, and Flasso represent the penalized joint credible regions approach, stochastic search variable selection, <i>L</i><sub>1</sub> penalized regression, and elastic net, the smooth clipped absolute deviation, the spike and slab lasso, and sparse fused lasso respectively. MSPE: out of sample predictive MSE; Pwr(10% FDR) is sensitivity controlling for 90% specificity; MS: estimated model size; FP: false positives, and <i>Cov</i><sub>95</sub> is coverage under 95% predictive intervals. The true model size for Cases I(a)-(b) is 10.</p

    Simulations for Case II, training sample size = 100, test sample size = 100.

    No full text
    <p>BVS-SL(<i>Îș</i>) represents the Bayes variable selection with belief parameter <i>Îș</i> for all edges. Pencred, SSVS, Lasso, EL, SCAD, SSL, and Flasso represent the penalized joint credible regions approach, stochastic search variable selection, <i>L</i><sub>1</sub> penalized regression, and elastic net, the smooth clipped absolute deviation, the spike and slab lasso, and sparse fused lasso respectively. MSPE: out of sample predictive MSE; Pwr(10% FDR) is sensitivity controlling for 90% specificity; MS: estimated model size; FP: false positives, and <i>Cov</i><sub>95</sub> is coverage under 95% predictive intervals. The true model size is 10.</p

    Precision recall characteristic plots for <i>p</i> = 80 under Models 1(a)-(d).

    No full text
    <p>BVS-SL + <i>G</i>0_<i>Îș</i> represents the Bayes variable selection with structure learning with belief parameter <i>Îș</i> for all edges. Pencred, SSVS, Lasso, ENET, represent the penalized credible regions approach, stochastic search variable selection, <i>L</i><sub>1</sub> penalized regression, and elastic net, respectively. The curves for SSL and SCAD are not presented to ensure greater clarity of the plot.</p

    A schematic diagram of our integrative modeling approach.

    No full text
    <p>Panel (a) shows the heatmaps of the genes by sample matrix constructed from data for three platforms; panel (b) depicts the prior graph constructed using previous studies; while panel (c) is the estimated graph of the genes within and across the platforms. The dashed arrows determine graphical structure and the solid arrows represent the regression model incorporating graphical dependencies. Red and green lines in panel (c) represent high negative and positive partial correlations under the estimated graph, while all other edges with lower absolute partial correlations are depicted with watermark lines. We have also provided an interactive version of <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0195070#pone.0195070.s001" target="_blank">S1 Interactive Plot</a>.</p
    corecore