11,751 research outputs found

    A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data

    Full text link
    Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at very different scales or described by very different data structures. We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through distance measures which can be chosen to capture particular aspects of the data. An approximate null distribution is proposed to compute p-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared to the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also report on an application of the GRV test to detect biological pathways in which genetic variability is associated to variation in gene expression levels in ovarian cancer samples, and present results obtained from two independent cohorts

    Matrix eQTL: Ultra fast eQTL analysis via large matrix operations

    Get PDF
    Expression quantitative trait loci (eQTL) mapping aims to determine genomic regions that regulate gene transcription. Expression QTL is used to study the regulatory structure of normal tissues and to search for genetic factors in complex diseases such as cancer, diabetes, and cystic fibrosis. A modern eQTL dataset contains millions of SNPs and thousands of transcripts measured for hundreds of samples. This makes the analysis computationally complex as it involves independent testing for association for every transcript-SNP pair. The heavy computational burden makes eQTL analysis less popular, often forces analysts to restrict their attention to just a subset of transcripts and SNPs. As larger genotype and gene expression datasets become available, the demand for fast tools for eQTL analysis increases. We present a new method for fast eQTL analysis via linear models, called Matrix eQTL. Matrix eQTL can model and test for association using both linear regression and ANOVA models. The models can include covariates to account for such factors as population structure, gender, and clinical variables. It also supports testing of heteroscedastic models and models with correlated errors. In our experiment on large datasets Matrix eQTL was thousands of times faster than the existing popular software for QTL/eQTL analysis. Matrix eQTL is implemented as both Matlab and R packages and thus can easily be run on Windows, Mac OS, and Linux systems. The software is freely available at the following address: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTLComment: 9 pages, 1 figur

    Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    Get PDF
    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants

    Dopamine perturbation of gene co-expression networks reveals differential response in schizophrenia for translational machinery.

    Get PDF
    The dopaminergic hypothesis of schizophrenia (SZ) postulates that positive symptoms of SZ, in particular psychosis, are due to disturbed neurotransmission via the dopamine (DA) receptor D2 (DRD2). However, DA is a reactive molecule that yields various oxidative species, and thus has important non-receptor-mediated effects, with empirical evidence of cellular toxicity and neurodegeneration. Here we examine non-receptor-mediated effects of DA on gene co-expression networks and its potential role in SZ pathology. Transcriptomic profiles were measured by RNA-seq in B-cell transformed lymphoblastoid cell lines from 514 SZ cases and 690 controls, both before and after exposure to DA ex vivo (100 μM). Gene co-expression modules were identified using Weighted Gene Co-expression Network Analysis for both baseline and DA-stimulated conditions, with each module characterized for biological function and tested for association with SZ status and SNPs from a genome-wide panel. We identified seven co-expression modules under baseline, of which six were preserved in DA-stimulated data. One module shows significantly increased association with SZ after DA perturbation (baseline: P = 0.023; DA-stimulated: P = 7.8 × 10-5; ΔAIC = -10.5) and is highly enriched for genes related to ribosomal proteins and translation (FDR = 4 × 10-141), mitochondrial oxidative phosphorylation, and neurodegeneration. SNP association testing revealed tentative QTLs underlying module co-expression, notably at FASTKD2 (top P = 2.8 × 10-6), a gene involved in mitochondrial translation. These results substantiate the role of translational machinery in SZ pathogenesis, providing insights into a possible dopaminergic mechanism disrupting mitochondrial function, and demonstrates the utility of disease-relevant functional perturbation in the study of complex genetic etiologies
    • …
    corecore