4 research outputs found

    Comment on article by Scutari

    Get PDF

    Conditional Distance Correlation Test for Gene Expression Level, DNA Methylation Level and Copy Number

    Get PDF
    Over the past years, efforts have been devoted to the genome-wide analysis of genetic and epigenetic profiles to better understand the underlying biological mechanisms of complex diseases such as cancer. It is of great importance to unravel the complex dependence structure between biological factors, and many conditional dependence tests have been developed to meet this need. The traditional partial correlation method can only capture the linear partial correlation, but not the nonlinear correlation. To overcome this limitation, we propose to use the innovative conditional distance correlation (CDC), which measures the conditional dependence between random vectors and detect nonlinear relations. In this thesis, the CDC measure is applied to the rich Cancer Genome Atlas (TCGA) ovarian cancer data, and we identify a list of interesting genes with nonlinear features. We integrate three important types of molecular features including gene expression, DNA methylation and copy number variation, and implement the partial correlation test and CDC test to infer the relations between the three measurements for each gene. Out of 196 candidate oncogenes and tumor suppressors, we identify 19 genes in which two of the molecular features are nonlinearly dependent given the third variable. Of these 19 genes, many were reported to be associated with ovarian cancer or breast cancer in the literature. Our findings could shed new light on the biological relations between the three important molecular aspects. This thesis is structured as follows: we begin with a brief introduction to ovarian cancer, TCGA data, the three molecular measurements, and two testing methods in Chapter 1. In the second chapter, we review different statistical methods including Pearson’s partial correlation and conditional distance correlation. In Chapter 3, we conduct an extensive simulation study to compare the empirical performance of different methods. In Chapter 4, we apply the new method to the TCGA ovarian data. We conclude the thesis with future directions in Chapter 5

    A Bayesian graphical model for integrative analysis of TCGA data

    No full text

    Application of Bayesian Modeling in High-throughput Genomic Data and Clinical Trial Design

    No full text
    My dissertation mainly focuses on developing Bayesian models for high-throughput data and clinical trial design. Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. So far, NGS techniques have been applied in quantitatively measurement of diverse platforms, such as RNA expression, DNA copy number variation (CNV) and DNA methylation. Although NGS is powerful and largely expedite biomedical research in various fields, challenge still remains due to the high modality of disparate high-throughput data, high variability of data acquisition, high dimensionality of biomedical data, and high complexity of genomics and proteomics, e.g., how to extract useful information for the enormous data produced by NGS or how to effectively integrate the information from different platforms. Bayesian has the potential to fill in these gaps. In my dissertation, I will propose Bayesian-based approaches to address above challenges so that we can take full advantage of the NGS technology. It includes three specific topics: (1) proposing BM-Map: a Bayesian mapping of multireads for NGS data, (2) proposing a Bayesian graphical model for integrative analysis of TCGA data, and (3) proposing a non- parametric Bayesian Bi-clustering for next generation sequencing count data. For the clinical trial design, I will propose a latent Gaussian process model with application to monitoring clinical trials
    corecore