Search CORE

475 research outputs found

Heritability in the genome-wide association era

Author: Kraft Peter
Zaitlen Noah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2016
Field of study

Heritability, the fraction of phenotypic variation explained by genetic variation, has been estimated for many phenotypes in a range of populations, organisms, and time points. The recent development of efficient genotyping and sequencing technology has led researchers to attempt to identify the genetic variants responsible for the genetic component of phenotype directly via GWAS. The gap between the phenotypic variance explained by GWAS results and those estimated from classical heritability methods has been termed the “missing heritability problem”. In this work, we examine modern methods for estimating heritability, which use the genotype and sequence data directly. We discuss them in the context of classical heritability methods, the missing heritability problem, and describe their implications for understanding the genetic architecture of complex phenotypes.National Institutes of Health (U.S.) (fellowship 5T32ES007142-27)National Institutes of Health (U.S.) (grant R21 DK084529

DSpace@MIT

CRISPR-Cas9-mediated functional dissection of 3'-UTRs.

Author: Ahituv Nadav
Biton Anne
Erle David J
Siegel David
Tonqueze Olivier Le
Zaitlen Noah
Zhao Wenxue
Publication venue: eScholarship, University of California
Publication date: 03/08/2017
Field of study

Many studies using reporter assays have demonstrated that 3' untranslated regions (3'-UTRs) regulate gene expression by controlling mRNA stability and translation. Due to intrinsic limitations of heterologous reporter assays, we sought to develop a gene editing approach to investigate the regulatory activity of 3'-UTRs in their native context. We initially used dual-CRISPR (clustered, regularly interspaced, short palindromic repeats)-Cas9 targeting to delete DNA regions corresponding to nine chemokine 3'-UTRs that destabilized mRNA in a reporter assay. Targeting six chemokine 3'-UTRs increased chemokine mRNA levels as expected. However, targeting CXCL1, CXCL6 and CXCL8 3'-UTRs unexpectedly led to substantial mRNA decreases. Metabolic labeling assays showed that targeting these three 3'-UTRs increased mRNA stability, as predicted by the reporter assay, while also markedly decreasing transcription, demonstrating an unexpected role for 3'-UTR sequences in transcriptional regulation. We further show that CRISPR-Cas9 targeting of specific 3'-UTR elements can be used for modulating gene expression and for highly parallel localization of active 3'-UTR elements in the native context. Our work demonstrates the duality and complexity of 3'-UTR sequences in regulation of gene expression and provides a useful approach for modulating gene expression and for functional annotation of 3'-UTRs in the native context

Crossref

eScholarship - University of California

Multiple breast cancer risk variants are associated with differential transcript isoform expression in tumors.

Author: Brenner Steven E
Camarda Roman
Caswell Jennifer L
Goga Andrei
Hu Donglei
Huntsman Scott
Zaitlen Noah
Zhou Alicia Y
Ziv Elad
Publication venue: eScholarship, University of California
Publication date: 15/10/2015
Field of study

Genome-wide association studies have identified over 70 single-nucleotide polymorphisms (SNPs) associated with breast cancer. A subset of these SNPs are associated with quantitative expression of nearby genes, but the functional effects of the majority remain unknown. We hypothesized that some risk SNPs may regulate alternative splicing. Using RNA-sequencing data from breast tumors and germline genotypes from The Cancer Genome Atlas, we tested the association between each risk SNP genotype and exon-, exon-exon junction- or transcript-specific expression of nearby genes. Six SNPs were associated with differential transcript expression of seven nearby genes at FDR < 0.05 (BABAM1, DCLRE1B/PHTF1, PEX14, RAD51L1, SRGAP2D and STXBP4). We next developed a Bayesian approach to evaluate, for each SNP, the overlap between the signal of association with breast cancer and the signal of association with alternative splicing. At one locus (SRGAP2D), this method eliminated the possibility that the breast cancer risk and the alternate splicing event were due to the same causal SNP. Lastly, at two loci, we identified the likely causal SNP for the alternative splicing event, and at one, functionally validated the effect of that SNP on alternative splicing using a minigene reporter assay. Our results suggest that the regulation of differential transcript isoform expression is the functional mechanism of some breast cancer risk SNPs and that we can use these associations to identify causal SNPs, target genes and the specific transcripts that may mediate breast cancer risk

PubMed Central

eScholarship - University of California

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

Author: Bhatia Gaurav
Gusev Alexander
Hirschhorn Joel
Pasaniuc Bogdan
Patterson Nick
Pickrell Joseph
Price Alkes L.
Shi Huwenbo
Strachan David P
Zaitlen Noah
Publication venue: 'Oxford University Press (OUP)'
Publication date: 12/09/2013
Field of study

Imputation using external reference panels is a widely used approach for increasing power in GWAS and meta-analysis. Existing HMM-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants (increasing to 87% (60%) when summary LD information is available from target samples) versus 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and is computationally very fast. As an empirical demonstration, we apply our method to 7 case-control phenotypes from the WTCCC data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of

\chi^2

association statistics) compared to HMM-based imputation from individual-level genotypes at the 227 (176) published SNPs in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of 4 lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic vs. non-genic loci for these traits, as compared to an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses.Comment: 32 pages, 4 figure

arXiv.org e-Print Archive

Crossref

PubMed Central

eScholarship - University of California

Genotyping common and rare variation using overlapping pool sequencing

Author: Eskin Eleazar
Halperin Eran
He Dan
Pasaniuc Bogdan
Zaitlen Noah
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Accurate Liability Estimation Improves Power in Ascertained Case Control Studies

Author: AL Price
AL Price
C Lippert
C Widmer
Christoph Lippert
D Golan
D Welter
Dan Geiger
David Heckerman
DJ Balding
ER Dempster
J Listgarten
J Yang
J Yang
J Yang
LA Hindorff
LC Tsoi
M Fakiola
N Fusi
N Patterson
N Zaitlen
N Zaitlen
Omer Weissbrod
S Sawcer
S Wright
SH Lee
X Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2015
Field of study

Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in non-randomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (Liability Estimator As a Phenotype, https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and demonstrate that this can lead to a substantial power increase

arXiv.org e-Print Archive

Crossref

MDC Repository

Recommended from our members

A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits.

Author: Dahl Andrew
Liu Zhengtong
Pazokitoroudi Ali
Rosset Saharon
Sankararaman Sriram
Zaitlen Noah
Publication venue: eScholarship, University of California
Publication date: 11/07/2024
Field of study

Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex

eScholarship - University of California

Recommended from our members

Dual gene activation and knockout screen reveals directional dependencies in genetic networks.

Author: Biton Anne
Blau James A
Boettcher Michael
Fu Haian
Kampmann Martin
Markegard Evan
McCormick Frank
McManus Michael T
Mo Xiulei
Tian Ruilin
Wagner Ryan T
Wu David
Zaitlen Noah
Publication venue: eScholarship, University of California
Publication date: 01/02/2018
Field of study

Understanding the direction of information flow is essential for characterizing how genetic networks affect phenotypes. However, methods to find genetic interactions largely fail to reveal directional dependencies. We combine two orthogonal Cas9 proteins from Streptococcus pyogenes and Staphylococcus aureus to carry out a dual screen in which one gene is activated while a second gene is deleted in the same cell. We analyze the quantitative effects of activation and knockout to calculate genetic interaction and directionality scores for each gene pair. Based on the results from over 100,000 perturbed gene pairs, we reconstruct a directional dependency network for human K562 leukemia cells and demonstrate how our approach allows the determination of directionality in activating genetic interactions. Our interaction network connects previously uncharacterized genes to well-studied pathways and identifies targets relevant for therapeutic intervention

eScholarship - University of California

Genotype Error Due to Low-Coverage Sequencing Induces Uncertainty in Polygenic Scoring

Author: Bhattacharya Arjun
Ding Yi
Gusev Alexander
Hou Kangcheng
Pasaniuc Bogdan
Petter Ella
Zaitlen Noah
Publication venue: DigitalCommons@TMC
Publication date: 03/08/2023
Field of study

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 1

DigitalCommons@The Texas Medical Center

Combining effects from rare and common genetic variants in an exome-wide association study of sequence data

Author: Aschard Hugues
Carey Vincent
Cho Michael H
Pasaniuc Bogdan
Qiu Weiliang
Zaitlen Noah
Publication venue: BioMed Central
Publication date: 01/11/2011
Field of study

Recent breakthroughs in next-generation sequencing technologies allow cost-effective methods for measuring a growing list of cellular properties, including DNA sequence and structural variation. Next-generation sequencing has the potential to revolutionize complex trait genetics by directly measuring common and rare genetic variants within a genome-wide context. Because for a given gene both rare and common causal variants can coexist and have independent effects on a trait, strategies that model the effects of both common and rare variants could enhance the power of identifying disease-associated genes. To date, little work has been done on integrating signals from common and rare variants into powerful statistics for finding disease genes in genome-wide association studies. In this analysis of the Genetic Analysis Workshop 17 data, we evaluate various strategies for association of rare, common, or a combination of both rare and common variants on quantitative phenotypes in unrelated individuals. We show that the analysis of common variants only using classical approaches can achieve higher power to detect causal genes than recently proposed rare variant methods and that strategies that combine association signals derived independently in rare and common variants can slightly increase the power compared to strategies that focus on the effect of either the rare variants or the common variants

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

eScholarship - University of California