83 research outputs found

    BAYESIAN INTEGRATIVE ANALYSIS OF OMICS DATA

    Get PDF
    Technological innovations have produced large multi-modal datasets that range in multiplatform genomic data, pathway data, proteomic data, imaging data and clinical data. Integrative analysis of such data sets have potentiality in revealing important biological and clinical insights into complex diseases like cancer. This dissertation focuses on Bayesian methodology establishment in integrative analysis of radiogenomics and pathway driver detection applied in cancer applications. We initially present Radio-iBAG that utilizes Bayesian approaches in analyzing radiological imaging and multi-platform genomic data, which we establish a multi-scale Bayesian hierarchical model that simultaneously identifies genomic and radiomic, i.e., radiology-based imaging markers, along with the latent associations between these two modalities, and to detect the overall prognostic relevance of the combined markers. Our method is motivated by and applied to The Cancer Genome Atlas glioblastoma multiforme data set, wherein it identifies important magnetic resonance imaging features and the associated genomic platforms that are also significantly related with patient survival times. For another aspect of integrative analysis, we then present pathDrive that aims to detect key genetic and epigenetic upstream drivers that influence pathway activity. The method is applied into colorectal cancer incorporated with its four molecular subtypes. For each of the pathways that significantly differentiates subgroups, we detect important genomic drivers that can be viewed as “switches” for the pathway activity. To extend the analysis, finally, we develop proteomic based pathway driver analysis for multiple cancer types wherein we simultaneously detect genomic upstream factors that influence a specific pathway for each cancer type within the cancer group. With Bayesian hierarchical model, we detect signals borrowing strength from common cancer type to rare cancer type, and simultaneously estimate their selection similarity. Through simulation study, our method is demonstrated in providing many advantages, including increased power and lower false discovery rates. We then apply the method into the analysis of multiple cancer groups, wherein we detect key genomic upstream drivers with proper biological interpretation. The overall framework and methodologies established in this dissertation illustrate further investigation in the field of integrative analysis of omics data, provide more comprehensive insight into biological mechanisms and processes, cancer development and progression

    Knowledge-Guided Bayesian Support Vector Machine Methods For High-Dimensional Data

    Get PDF
    Support vector machines (SVM) is a popular classification method for analysis of high dimensional data such as genomics data. Recently, new SVM methods have been developed to achieve variable selection through either frequentist regularization or Bayesian shrinkage. The Bayesian framework provides a probabilistic interpretation for SVM and allows direct uncertainty quantification. In this dissertation, we develop four knowledge-guided SVM methods for the analysis of high dimensional data. In Chapter 1, I first review the theory of SVM and existing methods for incorporating the prior knowledge, represented bby graphs into SVM. Second, I review the terminology on variable selection and limitations of the existing methods for SVM variable selection. Last, I introduce some Bayesian variable selection techniques as well as Markov chain Monte Carlo (MCMC) algorithms . In Chapter 2, we develop a new Bayesian SVM method that enables variable selection guided by structural information among predictors, e.g, biological pathways among genes. This method uses a spike and slab prior for feature selection combined with an Ising prior for incorporating structural information. The performance of the proposed method is evaluated in comparison with existing SVM methods in terms of prediction and feature selection in extensive simulations. Furthermore, the proposed method is illustrated in analysis of genomic data from a cancer study, demonstrating its advantage in generating biologically meaningful results and identifying potentially important features. The model developed in Chapter 2 might suffer from the issue of phase transition \citep{li2010bayesian} when the number of variables becomes extremely large. In Chapter 3, we propose another Bayesian SVM method that assigns an adaptive structured shrinkage prior to the coefficients and the graph information is incorporated via the hyper-priors imposed on the precision matrix of the log-transformed shrinkage parameters. This method is shown to outperform the method in Chapter 2 in both simulations and real data analysis.. In Chapter 4, to relax the linearity assumption in chapter 2 and 3, we develop a novel knowledge-guided Bayesian non-linear SVM. The proposed method uses a diagonal matrix with ones representing feature selected and zeros representing feature unselected, and combines with the Ising prior to perform feature selection. The performance of our method is evaluated and compared with several penalized linear SVM and the standard kernel SVM method in terms of prediction and feature selection in extensive simulation settings. Also, analyses of genomic data from a cancer study show that our method yields a more accurate prediction model for patient survival and reveals biologically more meaningful results than the existing methods. In Chapter 5, we extend the work of Chapter 4 and use a joint model to identify the relevant features and learn the structural information among them simultaneously. This model does not require that the structural information among the predictors is known, which is more powerful when the prior knowledge about pathways is limited or inaccurate. We demonstrate that our method outperforms the method developed in Chapter 4 when the prior knowledge is partially true or inaccurate in simulations and illustrate our proposed model with an application to a gliobastoma data set. In Chapter 6, we propose some future works including extending our methods to more general types of outcomes such as categorical or continuous variables

    Approaches for Analyzing Multivariate Mixed Endpoints With High-Dimensional Covariates.

    Get PDF
    Approaches for Analyzing Multivariate Mixed Endpoints With High-Dimensional Covariates
    • …
    corecore