475 research outputs found
Heritability in the genome-wide association era
Heritability, the fraction of phenotypic variation explained by genetic variation, has been estimated for many phenotypes in a range of populations, organisms, and time points. The recent development of efficient genotyping and sequencing technology has led researchers to attempt to identify the genetic variants responsible for the genetic component of phenotype directly via GWAS. The gap between the phenotypic variance explained by GWAS results and those estimated from classical heritability methods has been termed the “missing heritability problem”. In this work, we examine modern methods for estimating heritability, which use the genotype and sequence data directly. We discuss them in the context of classical heritability methods, the missing heritability problem, and describe their implications for understanding the genetic architecture of complex phenotypes.National Institutes of Health (U.S.) (fellowship 5T32ES007142-27)National Institutes of Health (U.S.) (grant R21 DK084529
CRISPR-Cas9-mediated functional dissection of 3'-UTRs.
Many studies using reporter assays have demonstrated that 3' untranslated regions (3'-UTRs) regulate gene expression by controlling mRNA stability and translation. Due to intrinsic limitations of heterologous reporter assays, we sought to develop a gene editing approach to investigate the regulatory activity of 3'-UTRs in their native context. We initially used dual-CRISPR (clustered, regularly interspaced, short palindromic repeats)-Cas9 targeting to delete DNA regions corresponding to nine chemokine 3'-UTRs that destabilized mRNA in a reporter assay. Targeting six chemokine 3'-UTRs increased chemokine mRNA levels as expected. However, targeting CXCL1, CXCL6 and CXCL8 3'-UTRs unexpectedly led to substantial mRNA decreases. Metabolic labeling assays showed that targeting these three 3'-UTRs increased mRNA stability, as predicted by the reporter assay, while also markedly decreasing transcription, demonstrating an unexpected role for 3'-UTR sequences in transcriptional regulation. We further show that CRISPR-Cas9 targeting of specific 3'-UTR elements can be used for modulating gene expression and for highly parallel localization of active 3'-UTR elements in the native context. Our work demonstrates the duality and complexity of 3'-UTR sequences in regulation of gene expression and provides a useful approach for modulating gene expression and for functional annotation of 3'-UTRs in the native context
Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors.
Genome-wide association studies have identified over 70 single-nucleotide polymorphisms (SNPs) associated with breast cancer. A subset of these SNPs are associated with quantitative expression of nearby genes, but the functional effects of the majority remain unknown. We hypothesized that some risk SNPs may regulate alternative splicing. Using RNA-sequencing data from breast tumors and germline genotypes from The Cancer Genome Atlas, we tested the association between each risk SNP genotype and exon-, exon-exon junction- or transcript-specific expression of nearby genes. Six SNPs were associated with differential transcript expression of seven nearby genes at FDR < 0.05 (BABAM1, DCLRE1B/PHTF1, PEX14, RAD51L1, SRGAP2D and STXBP4). We next developed a Bayesian approach to evaluate, for each SNP, the overlap between the signal of association with breast cancer and the signal of association with alternative splicing. At one locus (SRGAP2D), this method eliminated the possibility that the breast cancer risk and the alternate splicing event were due to the same causal SNP. Lastly, at two loci, we identified the likely causal SNP for the alternative splicing event, and at one, functionally validated the effect of that SNP on alternative splicing using a minigene reporter assay. Our results suggest that the regulation of differential transcript isoform expression is the functional mechanism of some breast cancer risk SNPs and that we can use these associations to identify causal SNPs, target genes and the specific transcripts that may mediate breast cancer risk
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment
Imputation using external reference panels is a widely used approach for
increasing power in GWAS and meta-analysis. Existing HMM-based imputation
approaches require individual-level genotypes. Here, we develop a new method
for Gaussian imputation from summary association statistics, a type of data
that is becoming widely available. In simulations using 1000 Genomes (1000G)
data, this method recovers 84% (54%) of the effective sample size for common
(>5%) and low-frequency (1-5%) variants (increasing to 87% (60%) when summary
LD information is available from target samples) versus 89% (67%) for HMM-based
imputation, which cannot be applied to summary statistics. Our approach
accounts for the limited sample size of the reference panel, a crucial step to
eliminate false-positive associations, and is computationally very fast. As an
empirical demonstration, we apply our method to 7 case-control phenotypes from
the WTCCC data and a study of height in the British 1958 birth cohort (1958BC).
Gaussian imputation from summary statistics recovers 95% (105%) of the
effective sample size (as quantified by the ratio of association
statistics) compared to HMM-based imputation from individual-level genotypes at
the 227 (176) published SNPs in the WTCCC (1958BC height) data. In addition,
for publicly available summary statistics from large meta-analyses of 4 lipid
traits, we publicly release imputed summary statistics at 1000G SNPs, which
could not have been obtained using previously published methods, and
demonstrate their accuracy by masking subsets of the data. We show that 1000G
imputation using our approach increases the magnitude and statistical evidence
of enrichment at genic vs. non-genic loci for these traits, as compared to an
analysis without 1000G imputation. Thus, imputation of summary statistics will
be a valuable tool in future functional enrichment analyses.Comment: 32 pages, 4 figure
Genotyping common and rare variation using overlapping pool sequencing
<p>Abstract</p> <p>Background</p> <p>Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants.</p> <p>Results</p> <p>In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications.</p> <p>Conclusions</p> <p>Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.</p
Accurate Liability Estimation Improves Power in Ascertained Case Control Studies
Linear mixed models (LMMs) have emerged as the method of choice for
confounded genome-wide association studies. However, the performance of LMMs in
non-randomly ascertained case-control studies deteriorates with increasing
sample size. We propose a framework called LEAP (Liability Estimator As a
Phenotype, https://github.com/omerwe/LEAP) that tests for association with
estimated latent values corresponding to severity of phenotype, and demonstrate
that this can lead to a substantial power increase
Recommended from our members
A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits.
Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex
Recommended from our members
Dual gene activation and knockout screen reveals directional dependencies in genetic networks.
Understanding the direction of information flow is essential for characterizing how genetic networks affect phenotypes. However, methods to find genetic interactions largely fail to reveal directional dependencies. We combine two orthogonal Cas9 proteins from Streptococcus pyogenes and Staphylococcus aureus to carry out a dual screen in which one gene is activated while a second gene is deleted in the same cell. We analyze the quantitative effects of activation and knockout to calculate genetic interaction and directionality scores for each gene pair. Based on the results from over 100,000 perturbed gene pairs, we reconstruct a directional dependency network for human K562 leukemia cells and demonstrate how our approach allows the determination of directionality in activating genetic interactions. Our interaction network connects previously uncharacterized genes to well-studied pathways and identifies targets relevant for therapeutic intervention
Genotype Error Due to Low-Coverage Sequencing Induces Uncertainty in Polygenic Scoring
Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 1
Combining effects from rare and common genetic variants in an exome-wide association study of sequence data
Recent breakthroughs in next-generation sequencing technologies allow cost-effective methods for measuring a growing list of cellular properties, including DNA sequence and structural variation. Next-generation sequencing has the potential to revolutionize complex trait genetics by directly measuring common and rare genetic variants within a genome-wide context. Because for a given gene both rare and common causal variants can coexist and have independent effects on a trait, strategies that model the effects of both common and rare variants could enhance the power of identifying disease-associated genes. To date, little work has been done on integrating signals from common and rare variants into powerful statistics for finding disease genes in genome-wide association studies. In this analysis of the Genetic Analysis Workshop 17 data, we evaluate various strategies for association of rare, common, or a combination of both rare and common variants on quantitative phenotypes in unrelated individuals. We show that the analysis of common variants only using classical approaches can achieve higher power to detect causal genes than recently proposed rare variant methods and that strategies that combine association signals derived independently in rare and common variants can slightly increase the power compared to strategies that focus on the effect of either the rare variants or the common variants
- …