Statistical Methods for Multi-Omics Data in Observational Studies

Abstract

With the advent of high-throughput technology, the availability of various omics data types has presented an unprecedented opportunity to answer questions that were not previously possible. However, the data analytic techniques for analyzing omics data are lagging the requirements for high-dimensional data; there are few adequate statistical methods and software available to address the complexity and multimodality of omics data. Valid statistical toolboxes are essential for exploring and understanding the underlying biology, generating new hypotheses, and designing new experiments to deliver potentially new effective therapeutics. In Chapter 2, I develop a statistical method, termed DrFARM, to identify and infer pleiotropic genes and variants in multi-trait genomewide association studies (GWAS). In a standard analysis, pleiotropic variants are identified by post hoc combining results across separate GWASes. But such two-stage procedures may lead to spurious results. DrFARM employs a joint regression model for simultaneous analysis of high-dimensional genetic variants while incorporating multilevel dependencies. This joint modeling approach permits universal FDR control. DrFARM combines strengths of the debiasing technique and the Cauchy combination test, both being theoretically justified, to establish a valid post-selection inference on pleiotropic variants. Through extensive simulations, I show that DrFARM controls the overall FDR. Applying DrFARM to data on 1,031 metabolites measured on 6,135 men from the METSIM study, I find 288 new metabolite associations at loci that did not reach statistical significance in prior METSIM metabolite GWAS. In addition, I discover new pleiotropic loci for 16 metabolite pairs. Chapter 3 is devoted to the development of a statistical approach, termed CAMP, for differential abundance analysis of microbiome data. Microbiome data analysis faces the challenge of sparsity, with many entries being zeros. In differential abundance analysis, the presence of excessive zeros in data violates distributional assumptions and creates ties, leading to an increased risk of type I errors and reduced statistical power. To deal with these limitations, I introduce the concept of censoring normalization, which treats zeros as censored observations, transforms raw read counts into tie-free time-to-event-like data. The novel data transformation also enables the use of survival analysis techniques, such as the Cox proportional hazards model, for differential abundance analysis. Through extensive simulations, I demonstrate that CAMP achieves desirable statistical properties, such as proper type I error control and high power. The application of CAMP to a human gut microbiome dataset identifies 60 new differentially abundant taxa across geographic locations, showcasing its usefulness. Chapter 4 concerns the development of residual diagnostics to scrutinize two critical assumptions in the instrumental variable analysis methodology that is widely used in Mendelian randomization: exclusion restriction and exchangeability. Despite their prominence, there are no existing methods for testing the two assumptions. I develop two residual diagnostic plots, termed Y-Y plot and Y-X plot, which serve as visual aids in detecting potential departures from these assumptions, allowing for candidate instrument variable screening prior to downstream analysis. Furthermore, I propose a procedure for ranking multiple candidate instruments, providing researchers with a practical means of selecting the most promising instrumental variables for their analyses. Applying this diagnostic methodology to the METSIM dataset, I found that 128 / 317 (40.4%) of instrument candidates violate either the exclusion restriction or exchangeability assumptions, providing empirical confirmation for the majority valid assumption (at least 50% of instruments are valid) that is basis for the building of robust Mendelian randomization methods.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/177907/1/lapsum_1.pd

    Similar works