18 research outputs found
Statistical tools for general association testing and control of false discoveries in group testing
In modern applications of high-throughput technologies, it is important to identify pairwise associations between variables, and desirable to use methods that are powerful and sensitive to a variety of association relationships. In the first part of the dissertation, we describe RankCover, a new non-parametric association test for association between two variables that measures the concentration of paired ranked points. Here `concentration' is quantified using a disk-covering statistic that is similar to those employed in spatial data analysis. Analysis of simulated datasets demonstrates that the method is robust and often powerful in comparison to competing general association tests. We also illustrate RankCover in the analysis of several real datasets. Using RankCover, we also propose a method of testing the association of two variables while controlling the effect of a third variable. In the second part of the dissertation, we describe statistical methodologies for testing hypotheses that can be collected into groups, with each group showing potentially different characteristics. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not easily apply to expression quantitative trait loci (eQTL) data, for which certain structured alternatives may be defensible and enable the researcher to avoid overly conservative approaches. In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypothesis data. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. Heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR) assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. Finally, we propose Z-REG-FDR, an approximate version of REG-FDR that uses only Z-statistics of association between genotype and expression at each SNP. Simulations demonstrate that Z-REG-FDR performed similarly to REG-FDR, but with much improved computational speed. We further propose an extension of Z-REG-FDR to a multi-tissue setting, providing the basis for gene-based multi-tissue analysis.Doctor of Philosoph
Recommended from our members
Model based heritability scores for high-throughput sequencing data
Supplementary materials. (PDF 1370 KB
Testing crossâphenotype effects of rare variants in longitudinal studies of complex traits
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of nextâgeneration sequencing technology, there has been substantial interest in identifying rare variants in genes that possess crossâphenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rareâvariant approaches exist for testing crossâphenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for crossâphenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genomeâwide scale due to the use of a closedâform test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/144294/1/gepi22121_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/144294/2/gepi22121.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/144294/3/gepi22121-sup-0001-SuppMat.pd
Predictive modeling of miRNA-mediated predisposition to alcohol-related phenotypes in mouse
Abstract Background MicroRNAs (miRNAs) are small non-coding RNAs that bind messenger RNAs and promote their degradation or repress their translation. There is increasing evidence of miRNAs playing an important role in alcohol related disorders. However, the role of miRNAs as mediators of the genetic effect on alcohol phenotypes is not fully understood. We conducted a high-throughput sequencing study to measure miRNA expression levels in alcohol naĂŻve animals in the LXS panel of recombinant inbred (RI) mouse strains. We then combined the sequencing data with genotype data, microarry gene expression data, and data on alcohol-related behavioral phenotypes such as âDrinking in the darkâ, âSleep timeâ, and âLow dose activationâ from the same RI panel. SNP-miRNA-gene triplets with strong association within the triplet that were also associated with one of the 4 alcohol phenotypes were selected and a Bayesian network analysis was used to aggregate results into a directed network model. Results We found several triplets with strong association within the triplet that were also associated with one of the alcohol phenotypes. The Bayesian network analysis found two networks where a miRNA mediates the genetic effect on the alcohol phenotype. The miRNAs were found to influence the expression of protein-coding genes, which in turn influences the quantitative phenotypes. The pathways in which these genes are enriched have been previously associated with alcohol-related traits. Conclusion This work enhances association studies by identifying miRNAs that may be mediating the association between genetic markers (SNPs) and the alcohol phenotypes. It suggests a mechanism of how genetic variants are affecting traits of interest through the modification of miRNA expression
Recommended from our members
miR-MaGiC improves quantification accuracy for small RNA-seq
Abstract Objective Many tools have been developed to profile microRNA (miRNA) expression from small RNA-seq data. These tools must contend with several issues: the small size of miRNAs, the small number of unique miRNAs, the fact that similar miRNAs can be transcribed from multiple loci, and the presence of miRNA isoforms known as isomiRs. Methods failing to address these issues can return misleading information. We propose a novel quantification method designed to address these concerns. Results We present miR-MaGiC, a novel miRNA quantification method, implemented as a cross-platform tool in Java. miR-MaGiC performs stringent mapping to a core region of each miRNA and defines a meaningful set of target miRNA sequences by collapsing the miRNA space to âfunctional groupsâ. We hypothesize that these two features, mapping stringency and collapsing, provide more optimal quantification to a more meaningful unit (i.e., miRNA family). We test miR-MaGiC and several published methods on 210 small RNA-seq libraries, evaluating each methodâs ability to accurately reflect global miRNA expression profiles. We define accuracy as total counts close to the total number of input reads originating from miRNAs. We find that miR-MaGiC, which incorporates both stringency and collapsing, provides the most accurate counts
Systems genetics analysis of the LXS recombinant inbred mouse strains:Genetic and molecular insights into acute ethanol tolerance.
We have been using the Inbred Long- and Short-Sleep mouse strains (ILS, ISS) and a recombinant inbred panel derived from them, the LXS, to investigate the genetic underpinnings of acute ethanol tolerance which is considered to be a risk factor for alcohol use disorders (AUDs). Here, we have used RNA-seq to examine the transcriptome of whole brain in 40 of the LXS strains 8 hours after a saline or ethanol "pretreatment" as in previous behavioral studies. Approximately 1/3 of the 14,184 expressed genes were significantly heritable and many were unique to the pretreatment. Several thousand cis- and trans-eQTLs were mapped; a portion of these also were unique to pretreatment. Ethanol pretreatment caused differential expression (DE) of 1,230 genes. Gene Ontology (GO) enrichment analysis suggested involvement in numerous biological processes including astrocyte differentiation, histone acetylation, mRNA splicing, and neuron projection development. Genetic correlation analysis identified hundreds of genes that were correlated to the behaviors. GO analysis indicated that these genes are involved in gene expression, chromosome organization, and protein transport, among others. The expression profiles of the DE genes and genes correlated to AFT in the ethanol pretreatment group (AFT-Et) were found to be similar to profiles of HDAC inhibitors. Hdac1, a cis-regulated gene that is located at the peak of a previously mapped QTL for AFT-Et, was correlated to 437 genes, most of which were also correlated to AFT-Et. GO analysis of these genes identified several enriched biological process terms including neuron-neuron synaptic transmission and potassium transport. In summary, the results suggest widespread genetic effects on gene expression, including effects that are pretreatment-specific. A number of candidate genes and biological functions were identified that could be mediating the behavioral responses. The most prominent of these was Hdac1 which may be regulating genes associated with glutamatergic signaling and potassium conductance