104 research outputs found
Matrix eQTL: Ultra fast eQTL analysis via large matrix operations
Expression quantitative trait loci (eQTL) mapping aims to determine genomic
regions that regulate gene transcription. Expression QTL is used to study the
regulatory structure of normal tissues and to search for genetic factors in
complex diseases such as cancer, diabetes, and cystic fibrosis. A modern eQTL
dataset contains millions of SNPs and thousands of transcripts measured for
hundreds of samples. This makes the analysis computationally complex as it
involves independent testing for association for every transcript-SNP pair. The
heavy computational burden makes eQTL analysis less popular, often forces
analysts to restrict their attention to just a subset of transcripts and SNPs.
As larger genotype and gene expression datasets become available, the demand
for fast tools for eQTL analysis increases. We present a new method for fast
eQTL analysis via linear models, called Matrix eQTL. Matrix eQTL can model and
test for association using both linear regression and ANOVA models. The models
can include covariates to account for such factors as population structure,
gender, and clinical variables. It also supports testing of heteroscedastic
models and models with correlated errors. In our experiment on large datasets
Matrix eQTL was thousands of times faster than the existing popular software
for QTL/eQTL analysis. Matrix eQTL is implemented as both Matlab and R packages
and thus can easily be run on Windows, Mac OS, and Linux systems. The software
is freely available at the following address:
http://www.bios.unc.edu/research/genomic_software/Matrix_eQTLComment: 9 pages, 1 figur
An Empirical Bayes Approach for Multiple Tissue eQTL Analysis
Expression quantitative trait loci (eQTL) analyses, which identify genetic
markers associated with the expression of a gene, are an important tool in the
understanding of diseases in human and other populations. While most eQTL
studies to date consider the connection between genetic variation and
expression in a single tissue, complex, multi-tissue data sets are now being
generated by the GTEx initiative. These data sets have the potential to improve
the findings of single tissue analyses by borrowing strength across tissues,
and the potential to elucidate the genotypic basis of differences between
tissues.
In this paper we introduce and study a multivariate hierarchical Bayesian
model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL directly models the
vector of correlations between expression and genotype across tissues. It
explicitly captures patterns of variation in the presence or absence of eQTLs,
as well as the heterogeneity of effect sizes across tissues. Moreover, the
model is applicable to complex designs in which the set of donors can (i) vary
from tissue to tissue, and (ii) exhibit incomplete overlap between tissues. The
MT-eQTL model is marginally consistent, in the sense that the model for a
subset of tissues can be obtained from the full model via marginalization.
Fitting of the MT-eQTL model is carried out via empirical Bayes, using an
approximate EM algorithm. Inferences concerning eQTL detection and the
configuration of eQTLs across tissues are derived from adaptive thresholding of
local false discovery rates, and maximum a-posteriori estimation, respectively.
We investigate the MT-eQTL model through a simulation study, and rigorously
establish the FDR control of the local FDR testing procedure under mild
assumptions appropriate for dependent data.Comment: accepted by Biostatistic
Reconstruction of a low-rank matrix in the presence of Gaussian noise
This paper addresses the problem of reconstructing a low-rank signal matrix observed with additive Gaussian noise. We first establish that, under mild assumptions, one can restrict attention to orthogonally equivariant reconstruction methods, which act only on the singular values of the observed matrix and do not affect its singular vectors. Using recent results in random matrix theory, we then propose a new reconstruction method that aims to reverse the effect of the noise on the singular value decomposition of the signal matrix. In conjunction with the proposed reconstruction method we also introduce a KolmogorovāSmirnov based estimator of the noise variance
Finding large average submatrices in high dimensional data
The search for sample-variable associations is an important problem in the
exploratory analysis of high dimensional data. Biclustering methods search for
sample-variable associations in the form of distinguished submatrices of the
data matrix. (The rows and columns of a submatrix need not be contiguous.) In
this paper we propose and evaluate a statistically motivated biclustering
procedure (LAS) that finds large average submatrices within a given real-valued
data matrix. The procedure operates in an iterative-residual fashion, and is
driven by a Bonferroni-based significance score that effectively trades off
between submatrix size and average value. We examine the performance and
potential utility of LAS, and compare it with a number of existing methods,
through an extensive three-part validation study using two gene expression
datasets. The validation study examines quantitative properties of biclusters,
biological and clinical assessments using auxiliary information, and
classification of disease subtypes using bicluster membership. In addition, we
carry out a simulation study to assess the effectiveness and noise sensitivity
of the LAS search procedure. These results suggest that LAS is an effective
exploratory tool for the discovery of biologically relevant structures in high
dimensional data. Software is available at https://genome.unc.edu/las/.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS239 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Computational tools for discovery and interpretation of expression quantitative trait loci
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation
Genome-wide association study meta-analysis of suicide death and suicidal behavior
Suicide is a worldwide health crisis. We aimed to identify genetic risk variants associated with suicide death and suicidal behavior. Meta-analysis for suicide death was performed using 3765 cases from Utah and matching 6572 controls of European ancestry. Meta-analysis for suicidal behavior using data across five cohorts (n = 8315 cases and 256,478 psychiatric or populational controls of European ancestry) was also performed. One locus in neuroligin 1 (NLGN1) passing the genome-wide significance threshold for suicide death was identified (top SNP rs73182688, with p = 5.48 x 10(-8) before and p = 4.55 x 10(-8) after mtCOJO analysis conditioning on MDD to remove genetic effects on suicide mediated by MDD). Conditioning on suicidal attempts did not significantly change the association strength (p = 6.02 x 10(-8)), suggesting suicide death specificity. NLGN1 encodes a member of a family of neuronal cell surface proteins. Members of this family act as splice site-specific ligands for beta-neurexins and may be involved in synaptogenesis. The NRXN-NLGN pathway was previously implicated in suicide, autism, and schizophrenia. We additionally identified ROBO2 and ZNF28 associations with suicidal behavior in the meta-analysis across five cohorts in gene-based association analysis using MAGMA. Lastly, we replicated two loci including variants near SOX5 and LOC101928519 associated with suicidal attempts identified in the ISGC and MVP meta-analysis using the independent FinnGen samples. Suicide death and suicidal behavior showed positive genetic correlations with depression, schizophrenia, pain, and suicidal attempt, and negative genetic correlation with educational attainment. These correlations remained significant after conditioning on depression, suggesting pleiotropic effects among these traits. Bidirectional generalized summary-data-based Mendelian randomization analysis suggests that genetic risk for the suicidal attempt and suicide death are both bi-directionally causal for MDD.Peer reviewe
seeQTL: a searchable database for human eQTLs
Summary: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots
Deep Sequencing of Three Loci Implicated in Large-Scale Genome-Wide Association Study Smoking Meta-Analyses
Genome-wide association study meta-analyses have robustly implicated three loci that affect susceptibility for smoking: CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6 and EGLN2\CYP2A6. Functional follow-up studies of these loci are needed to provide insight into biological mechanisms. However, these efforts have been hampered by a lack of knowledge about the specific causal variant(s) involved. In this study, we prioritized variants in terms of the likelihood they account for the reported associations. We employed targeted capture of the CHRNA5\CHRNA3\CHRNB4, CHRNB3\CHRNA6, and EGLN2\CYP2A6 loci and flanking regions followed by next-generation deep sequencing (mean coverage 78Ć) to capture genomic variation in 363 individuals. We performed single locus tests to determine if any single variant accounts for the association, and examined if sets of (rare) variants that overlapped with biologically meaningful annotations account for the associations. In total, we investigated 963 variants, of which 71.1% were rare (minor allele frequency < 0.01), 6.02% were insertion/deletions, and 51.7% were catalogued in dbSNP141. The single variant results showed that no variant fully accounts for the association in any region. In the variant set results, CHRNB4 accounts for most of the signal with significant sets consisting of directly damaging variants. CHRNA6 explains most of the signal in the CHRNB3\CHRNA6 locus with significant sets indicating a regulatory role for CHRNA6. Significant sets in CYP2A6 involved directly damaging variants while the significant variant sets suggested a regulatory role for EGLN2. We found that multiple variants implicating multiple processes explain the signal. Some variants can be prioritized for functional follow-up. Ā© The Author 2015. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: [email protected]
FastMap: Fast eQTL mapping in homozygous populations
Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105ā106, respectively
- ā¦