108 research outputs found
Assessing allele-specific expression across multiple tissues from RNA-seq read data
Motivation: RNA sequencing enables allele-specific expression (ASE) studies that complement standard genotype expression studies for common variants and, importantly, also allow measuring the regulatory impact of rare variants. The Genotype-Tissue Expression (GTEx) project is collecting RNA-seq data on multiple tissues of a same set of individuals and novel methods are required for the analysis of these data. Results: We present a statistical method to compare different patterns of ASE across tissues and to classify genetic variants according to their impact on the tissue-wide expression profile. We focus on strong ASE effects that we are expecting to see for protein-truncating variants, but our method can also be adjusted for other types of ASE effects. We illustrate the method with a real data example on a tissue-wide expression profile of a variant causal for lipoid proteinosis, and with a simulation study to assess our method more generally. Availability and implementation: http://www.well.ox.ac.uk/~rivas/mamba/. R-sources and data examples http://www.iki.fi/mpirinen/ Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Dual gene activation and knockout screen reveals directional dependencies in genetic networks.
Understanding the direction of information flow is essential for characterizing how genetic networks affect phenotypes. However, methods to find genetic interactions largely fail to reveal directional dependencies. We combine two orthogonal Cas9 proteins from Streptococcus pyogenes and Staphylococcus aureus to carry out a dual screen in which one gene is activated while a second gene is deleted in the same cell. We analyze the quantitative effects of activation and knockout to calculate genetic interaction and directionality scores for each gene pair. Based on the results from over 100,000 perturbed gene pairs, we reconstruct a directional dependency network for human K562 leukemia cells and demonstrate how our approach allows the determination of directionality in activating genetic interactions. Our interaction network connects previously uncharacterized genes to well-studied pathways and identifies targets relevant for therapeutic intervention
Analysis of case-control association studies with known risk variants
Motivation: The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature. Results: We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants. Availability: LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/ Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Recommended from our members
Leveraging population admixture to characterize the heritability of complex traits.
Despite recent progress on estimating the heritability explained by genotyped SNPs (h(2)g), a large gap between h(2)g and estimates of total narrow-sense heritability (h(2)) remains. Explanations for this gap include rare variants or upward bias in family-based estimates of h(2) due to shared environment or epistasis. We estimate h(2) from unrelated individuals in admixed populations by first estimating the heritability explained by local ancestry (h(2)γ). We show that h(2)γ = 2FSTCθ(1 - θ)h(2), where FSTC measures frequency differences between populations at causal loci and θ is the genome-wide ancestry proportion. Our approach is not susceptible to biases caused by epistasis or shared environment. We applied this approach to the analysis of 13 phenotypes in 21,497 African-American individuals from 3 cohorts. For height and body mass index (BMI), we obtained h(2) estimates of 0.55 ± 0.09 and 0.23 ± 0.06, respectively, which are larger than estimates of h(2)g in these and other data but smaller than family-based estimates of h(2)
Quantifying Missing Heritability at Known GWAS Loci
Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain 1.29 X more heritability than GWAS-associated SNPs on average (P = 3.3 X 10[superscript -5]). For some diseases, this increase was individually significant:2.07 X for Multiple Sclerosis (MS) (P = 6.5 X 10 [superscript -9]) and for Crohn's Disease (CD) (P = 1.3 X 10[superscript -3]); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained 7.15 X more MS heritability than known MS SNPs (P 20,000 Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with 2.37 X more heritability from all SNPs at GWAS loci (P = 2.3 X 10[superscript -6]) and more heritability from all autoimmune disease loci (P < 1 X 10[superscript -16]) compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.National Institutes of Health (U.S.) (Grant R03HG006731)National Institutes of Health (U.S.) (Fellowship F32GM106584
Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits
Important knowledge about the determinants of complex human phenotypes can be obtained from the estimation of heritability, the fraction of phenotypic variation in a population that is determined by genetic factors. Here, we make use of extensive phenotype data in Iceland, long-range phased genotypes, and a population-wide genealogical database to examine the heritability of 11 quantitative and 12 dichotomous phenotypes in a sample of 38,167 individuals. Most previous estimates of heritability are derived from family-based approaches such as twin studies, which may be biased upwards by epistatic interactions or shared environment. Our estimates of heritability, based on both closely and distantly related pairs of individuals, are significantly lower than those from previous studies. We examine phenotypic correlations across a range of relationships, from siblings to first cousins, and find that the excess phenotypic correlation in these related individuals is predominantly due to shared environment as opposed to dominance or epistasis. We also develop a new method to jointly estimate narrow-sense heritability and the heritability explained by genotyped SNPs. Unlike existing methods, this approach permits the use of information from both closely and distantly related pairs of individuals, thereby reducing the variance of estimates of heritability explained by genotyped SNPs while preventing upward bias. Our results show that common SNPs explain a larger proportion of the heritability than previously thought, with SNPs present on Illumina 300K genotyping arrays explaining more than half of the heritability for the 23 phenotypes examined in this study. Much of the remaining heritability is likely to be due to rare alleles that are not captured by standard genotyping arrays
Test chamber investigation of the volatilization from source materials of brominated flame retardants and their subsequent deposition to indoor dust
Numerous studies have reported elevated concentrations of brominated flame retardants (BFRs) in dust from indoor micro-environments. Limited information is available, however, on the pathways via which BFRs in source materials transfer to indoor dust. The most likely hypothesized pathways are (a) volatilization from the source with subsequent partitioning to dust, (b) abrasion of the treated product, transferring microscopic fibers or particles to the dust (c) direct uptake to dust via contact between source and dust. This study reports the development and application of an in-house test chamber for investigating BFR volatilization from source materials and subsequent partitioning to dust. The performance of the chamber was evaluated against that of a commercially available chamber, and inherent issues with such chambers were investigated, such as loss due to sorption of BFRs to chamber surfaces (so-called sink effects). The partitioning of polybrominated diphenyl ethers to dust, post-volatilization from an artificial source was demonstrated, while analysis in the test chamber of a fabric curtain treated with the hexabromocyclododecane formulation, resulted in dust concentrations exceeding substantially those detected in the dust pre-experiment. These results provide the first experimental evidence of BFR volatilization followed by deposition to dust
Whole-Genome Sequencing of Pharmacogenetic Drug Response in Racially Diverse Children with Asthma
RATIONALE: Albuterol, a bronchodilator medication, is the first-line therapy for asthma worldwide. There are significant racial/ethnic differences in albuterol drug response.
OBJECTIVES: To identify genetic variants important for bronchodilator drug response (BDR) in racially diverse children.
METHODS: We performed the first whole-genome sequencing pharmacogenetics study from 1,441 children with asthma from the tails of the BDR distribution to identify genetic association with BDR.
MEASUREMENTS AND MAIN RESULTS: We identified population-specific and shared genetic variants associated with BDR, including genome-wide significant (P \u3c 3.53 × 10
CONCLUSIONS: The lack of minority data, despite a collaboration of eight universities and 13 individual laboratories, highlights the urgent need for a dedicated national effort to prioritize diversity in research. Our study expands the understanding of pharmacogenetic analyses in racially/ethnically diverse populations and advances the foundation for precision medicine in at-risk and understudied minority populations
Whole-genome sequencing of pharmacogenetic drug response in racially diverse children with asthma
RATIONALE: Albuterol, a bronchodilator medication, is the first-line therapy for asthma worldwide. There are significant racial/ethnic differences in albuterol drug response.
OBJECTIVES: To identify genetic variants important for bronchodilator drug response (BDR) in racially diverse children.
METHODS: We performed the first whole-genome sequencing pharmacogenetics study from 1,441 children with asthma from the tails of the BDR distribution to identify genetic association with BDR.
MEASUREMENTS AND MAIN RESULTS: We identified population-specific and shared genetic variants associated with BDR, including genome-wide significant (P \u3c 3.53 × 10-7) and suggestive (P \u3c 7.06 × 10-6) loci near genes previously associated with lung capacity (DNAH5), immunity (NFKB1 and PLCB1), and β-adrenergic signaling (ADAMTS3 and COX18). Functional analyses of the BDR-associated SNP in NFKB1 revealed potential regulatory function in bronchial smooth muscle cells. The SNP is also an expression quantitative trait locus for a neighboring gene, SLC39A8. The lack of other asthma study populations with BDR and whole-genome sequencing data on minority children makes it impossible to perform replication of our rare variant associations. Minority underrepresentation also poses significant challenges to identify age-matched and population-matched cohorts of sufficient sample size for replication of our common variant findings.
CONCLUSIONS: The lack of minority data, despite a collaboration of eight universities and 13 individual laboratories, highlights the urgent need for a dedicated national effort to prioritize diversity in research. Our study expands the understanding of pharmacogenetic analyses in racially/ethnically diverse populations and advances the foundation for precision medicine in at-risk and understudied minority populations
Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies
Genetic case-control association studies often include data on clinical covariates, such as body mass index (BMI), smoking status, or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios for established variants estimated from low–BMI cases are larger than those estimated from high–BMI cases. An unanswered question is how to use this information to maximize statistical power in case-control studies that ascertain individuals on the basis of phenotype (case-control ascertainment) or phenotype and clinical covariates (case-control-covariate ascertainment). While current approaches improve power in studies with random ascertainment, they often lose power under case-control ascertainment and fail to capture available power increases under case-control-covariate ascertainment. We show that an informed conditioning approach, based on the liability threshold model with parameters informed by external epidemiological information, fully accounts for disease prevalence and non-random ascertainment of phenotype as well as covariates and provides a substantial increase in power while maintaining a properly controlled false-positive rate. Our method outperforms standard case-control association tests with or without covariates, tests of gene x covariate interaction, and previously proposed tests for dealing with covariates in ascertained data, with especially large improvements in the case of case-control-covariate ascertainment. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, lung cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 89,726 samples. In these datasets, informed conditioning outperforms logistic regression for 115 of the 157 known associated variants investigated (P-value = 1×10−9). The improvement varied across diseases with a 16% median increase in χ2 test statistics and a commensurate increase in power. This suggests that applying our method to existing and future association studies of these diseases may identify novel disease loci
- …