104 research outputs found

    Matrix eQTL: Ultra fast eQTL analysis via large matrix operations

    Get PDF
    Expression quantitative trait loci (eQTL) mapping aims to determine genomic regions that regulate gene transcription. Expression QTL is used to study the regulatory structure of normal tissues and to search for genetic factors in complex diseases such as cancer, diabetes, and cystic fibrosis. A modern eQTL dataset contains millions of SNPs and thousands of transcripts measured for hundreds of samples. This makes the analysis computationally complex as it involves independent testing for association for every transcript-SNP pair. The heavy computational burden makes eQTL analysis less popular, often forces analysts to restrict their attention to just a subset of transcripts and SNPs. As larger genotype and gene expression datasets become available, the demand for fast tools for eQTL analysis increases. We present a new method for fast eQTL analysis via linear models, called Matrix eQTL. Matrix eQTL can model and test for association using both linear regression and ANOVA models. The models can include covariates to account for such factors as population structure, gender, and clinical variables. It also supports testing of heteroscedastic models and models with correlated errors. In our experiment on large datasets Matrix eQTL was thousands of times faster than the existing popular software for QTL/eQTL analysis. Matrix eQTL is implemented as both Matlab and R packages and thus can easily be run on Windows, Mac OS, and Linux systems. The software is freely available at the following address: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTLComment: 9 pages, 1 figur

    An Empirical Bayes Approach for Multiple Tissue eQTL Analysis

    Full text link
    Expression quantitative trait loci (eQTL) analyses, which identify genetic markers associated with the expression of a gene, are an important tool in the understanding of diseases in human and other populations. While most eQTL studies to date consider the connection between genetic variation and expression in a single tissue, complex, multi-tissue data sets are now being generated by the GTEx initiative. These data sets have the potential to improve the findings of single tissue analyses by borrowing strength across tissues, and the potential to elucidate the genotypic basis of differences between tissues. In this paper we introduce and study a multivariate hierarchical Bayesian model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL directly models the vector of correlations between expression and genotype across tissues. It explicitly captures patterns of variation in the presence or absence of eQTLs, as well as the heterogeneity of effect sizes across tissues. Moreover, the model is applicable to complex designs in which the set of donors can (i) vary from tissue to tissue, and (ii) exhibit incomplete overlap between tissues. The MT-eQTL model is marginally consistent, in the sense that the model for a subset of tissues can be obtained from the full model via marginalization. Fitting of the MT-eQTL model is carried out via empirical Bayes, using an approximate EM algorithm. Inferences concerning eQTL detection and the configuration of eQTLs across tissues are derived from adaptive thresholding of local false discovery rates, and maximum a-posteriori estimation, respectively. We investigate the MT-eQTL model through a simulation study, and rigorously establish the FDR control of the local FDR testing procedure under mild assumptions appropriate for dependent data.Comment: accepted by Biostatistic

    Reconstruction of a low-rank matrix in the presence of Gaussian noise

    Get PDF
    This paper addresses the problem of reconstructing a low-rank signal matrix observed with additive Gaussian noise. We first establish that, under mild assumptions, one can restrict attention to orthogonally equivariant reconstruction methods, which act only on the singular values of the observed matrix and do not affect its singular vectors. Using recent results in random matrix theory, we then propose a new reconstruction method that aims to reverse the effect of the noise on the singular value decomposition of the signal matrix. In conjunction with the proposed reconstruction method we also introduce a Kolmogorovā€“Smirnov based estimator of the noise variance

    Finding large average submatrices in high dimensional data

    Get PDF
    The search for sample-variable associations is an important problem in the exploratory analysis of high dimensional data. Biclustering methods search for sample-variable associations in the form of distinguished submatrices of the data matrix. (The rows and columns of a submatrix need not be contiguous.) In this paper we propose and evaluate a statistically motivated biclustering procedure (LAS) that finds large average submatrices within a given real-valued data matrix. The procedure operates in an iterative-residual fashion, and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value. We examine the performance and potential utility of LAS, and compare it with a number of existing methods, through an extensive three-part validation study using two gene expression datasets. The validation study examines quantitative properties of biclusters, biological and clinical assessments using auxiliary information, and classification of disease subtypes using bicluster membership. In addition, we carry out a simulation study to assess the effectiveness and noise sensitivity of the LAS search procedure. These results suggest that LAS is an effective exploratory tool for the discovery of biologically relevant structures in high dimensional data. Software is available at https://genome.unc.edu/las/.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS239 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Computational tools for discovery and interpretation of expression quantitative trait loci

    Get PDF
    Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation

    Genome-wide association study meta-analysis of suicide death and suicidal behavior

    Get PDF
    Suicide is a worldwide health crisis. We aimed to identify genetic risk variants associated with suicide death and suicidal behavior. Meta-analysis for suicide death was performed using 3765 cases from Utah and matching 6572 controls of European ancestry. Meta-analysis for suicidal behavior using data across five cohorts (n = 8315 cases and 256,478 psychiatric or populational controls of European ancestry) was also performed. One locus in neuroligin 1 (NLGN1) passing the genome-wide significance threshold for suicide death was identified (top SNP rs73182688, with p = 5.48 x 10(-8) before and p = 4.55 x 10(-8) after mtCOJO analysis conditioning on MDD to remove genetic effects on suicide mediated by MDD). Conditioning on suicidal attempts did not significantly change the association strength (p = 6.02 x 10(-8)), suggesting suicide death specificity. NLGN1 encodes a member of a family of neuronal cell surface proteins. Members of this family act as splice site-specific ligands for beta-neurexins and may be involved in synaptogenesis. The NRXN-NLGN pathway was previously implicated in suicide, autism, and schizophrenia. We additionally identified ROBO2 and ZNF28 associations with suicidal behavior in the meta-analysis across five cohorts in gene-based association analysis using MAGMA. Lastly, we replicated two loci including variants near SOX5 and LOC101928519 associated with suicidal attempts identified in the ISGC and MVP meta-analysis using the independent FinnGen samples. Suicide death and suicidal behavior showed positive genetic correlations with depression, schizophrenia, pain, and suicidal attempt, and negative genetic correlation with educational attainment. These correlations remained significant after conditioning on depression, suggesting pleiotropic effects among these traits. Bidirectional generalized summary-data-based Mendelian randomization analysis suggests that genetic risk for the suicidal attempt and suicide death are both bi-directionally causal for MDD.Peer reviewe

    seeQTL: a searchable database for human eQTLs

    Get PDF
    Summary: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots

    Deep Sequencing of Three Loci Implicated in Large-Scale Genome-Wide Association Study Smoking Meta-Analyses

    Get PDF
    Genome-wide association study meta-analyses have robustly implicated three loci that affect susceptibility for smoking: CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6 and EGLN2\CYP2A6. Functional follow-up studies of these loci are needed to provide insight into biological mechanisms. However, these efforts have been hampered by a lack of knowledge about the specific causal variant(s) involved. In this study, we prioritized variants in terms of the likelihood they account for the reported associations. We employed targeted capture of the CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6, and EGLN2\CYP2A6 loci and flanking regions followed by next-generation deep sequencing (mean coverage 78Ɨ) to capture genomic variation in 363 individuals. We performed single locus tests to determine if any single variant accounts for the association, and examined if sets of (rare) variants that overlapped with biologically meaningful annotations account for the associations. In total, we investigated 963 variants, of which 71.1% were rare (minor allele frequency < 0.01), 6.02% were insertion/deletions, and 51.7% were catalogued in dbSNP141. The single variant results showed that no variant fully accounts for the association in any region. In the variant set results, CHRNB4 accounts for most of the signal with significant sets consisting of directly damaging variants. CHRNA6 explains most of the signal in the CHRNB3\CHRNA6 locus with significant sets indicating a regulatory role for CHRNA6. Significant sets in CYP2A6 involved directly damaging variants while the significant variant sets suggested a regulatory role for EGLN2. We found that multiple variants implicating multiple processes explain the signal. Some variants can be prioritized for functional follow-up. Ā© The Author 2015. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: [email protected]

    FastMap: Fast eQTL mapping in homozygous populations

    Get PDF
    Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105ā€“106, respectively
    • ā€¦
    corecore