69 research outputs found

    Rare coding variants in 35 genes associate with circulating lipid levels-A multi-ancestry analysis of 170,000 exomes

    Get PDF
    Large-scale gene sequencing studies for complex traits have the potential to identify causal genes with therapeutic implications. We performed gene-based association testing of blood lipid levels with rare (minor allele frequency 1%) predicted damaging coding variation by using sequence data from 170,000 individuals from multiple ancestries: 97,493 European, 30,025 South Asian, 16,507 African, 16,440 Hispanic/Latino, 10,420 East Asian, and 1,182 Samoan. We identified 35 genes associated with circulating lipid levels; some of these genes have not been previously associated with lipid levels when using rare coding variation from population-based samples. We prioritize 32 genes in array-based genome-wide association study (GWAS) loci based on aggregations of rare coding variants; three (EVI5, SH2B3, and PLIN1) had no prior association of rare coding variants with lipid levels. Most of our associated genes showed evidence of association among multiple ancestries. Finally, we observed an enrichment of gene-based associations for low-density lipoprotein cholesterol drug target genes and for genes closest to GWAS index single-nucleotide polymorphisms (SNPs). Our results demonstrate that gene-based associations can be beneficial for drug target development and provide evidence that the gene closest to the array-based GWAS index SNP is often the functional gene for blood lipid levels.Peer reviewe

    GAWMerge expands GWAS sample size and diversity by combining array-based genotyping and whole-genome sequencing

    Get PDF
    Genome-wide association studies (GWAS) have made impactful discoveries for complex diseases, often by amassing very large sample sizes. Yet, GWAS of many diseases remain underpowered, especially for non-European ancestries. One cost-effective approach to increase sample size is to combine existing cohorts, which may have limited sample size or be case-only, with public controls, but this approach is limited by the need for a large overlap in variants across genotyping arrays and the scarcity of non-European controls. We developed and validated a protocol, Genotyping Array-WGS Merge (GAWMerge), for combining genotypes from arrays and whole-genome sequencing, ensuring complete variant overlap, and allowing for diverse samples like Trans-Omics for Precision Medicine to be used. Our protocol involves phasing, imputation, and filtering. We illustrated its ability to control technology driven artifacts and type-I error, as well as recover known disease-associated signals across technologies, independent datasets, and ancestries in smoking-related cohorts. GAWMerge enables genetic studies to leverage existing cohorts to validly increase sample size and enhance discovery for understudied traits and ancestries

    Epigenome-Wide Association Study of Kidney Function Identifies Trans-Ethnic and Ethnic-Specific Loci

    Get PDF
    BACKGROUND: DNA methylation (DNAm) is associated with gene regulation and estimated glomerular filtration rate (eGFR), a measure of kidney function. Decreased eGFR is more common among US Hispanics and African Americans. The causes for this are poorly understood. We aimed to identify trans-ethnic and ethnic-specific differentially methylated positions (DMPs) associated with eGFR using an agnostic, genome-wide approach. METHODS: The study included up to 5428 participants from multi-ethnic studies for discovery and 8109 participants for replication. We tested the associations between whole blood DNAm and eGFR using beta values from Illumina 450K or EPIC arrays. Ethnicity-stratified analyses were performed using linear mixed models adjusting for age, sex, smoking, and study-specific and technical variables. Summary results were meta-analyzed within and across ethnicities. Findings were assessed using integrative epigenomics methods and pathway analyses. RESULTS: We identified 93 DMPs associated with eGFR at an FDR of 0.05 and replicated 13 and 1 DMPs across independent samples in trans-ethnic and African American meta-analyses, respectively. The study also validated 6 previously published DMPs. Identified DMPs showed significant overlap enrichment with DNase I hypersensitive sites in kidney tissue, sites associated with the expression of proximal genes, and transcription factor motifs and pathways associated with kidney tissue and kidney development. CONCLUSIONS: We uncovered trans-ethnic and ethnic-specific DMPs associated with eGFR, including DMPs enriched in regulatory elements in kidney tissue and pathways related to kidney development. These findings shed light on epigenetic mechanisms associated with kidney function, bridging the gap between population-specific eGFR-associated DNAm and tissue-specific regulatory context

    Canonical correlation analysis for multi-omics: Application to cross-cohort analysis

    Get PDF
    Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features–referred to as canonical variables (CVs)–within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits

    Epigenome-wide association study of kidney function identifies trans-ethnic and ethnic-specific loci

    Get PDF
    BACKGROUND: DNA methylation (DNAm) is associated with gene regulation and estimated glomerular filtration rate (eGFR), a measure of kidney function. Decreased eGFR is more common among US Hispanics and African Americans. The causes for this are poorly understood. We aimed to identify trans-ethnic and ethnic-specific differentially methylated positions (DMPs) associated with eGFR using an agnostic, genome-wide approach. METHODS: The study included up to 5428 participants from multi-ethnic studies for discovery and 8109 participants for replication. We tested the associations between whole blood DNAm and eGFR using beta values from Illumina 450K or EPIC arrays. Ethnicity-stratified analyses were performed using linear mixed models adjusting for age, sex, smoking, and study-specific and technical variables. Summary results were meta-analyzed within and across ethnicities. Findings were assessed using integrative epigenomics methods and pathway analyses. RESULTS: We identified 93 DMPs associated with eGFR at an FDR of 0.05 and replicated 13 and 1 DMPs across independent samples in trans-ethnic and African American meta-analyses, respectively. The study also validated 6 previously published DMPs. Identified DMPs showed significant overlap enrichment with DNase I hypersensitive sites in kidney tissue, sites associated with the expression of proximal genes, and transcription factor motifs and pathways associated with kidney tissue and kidney development. CONCLUSIONS: We uncovered trans-ethnic and ethnic-specific DMPs associated with eGFR, including DMPs enriched in regulatory elements in kidney tissue and pathways related to kidney development. These findings shed light on epigenetic mechanisms associated with kidney function, bridging the gap between population-specific eGFR-associated DNAm and tissue-specific regulatory context

    Whole-genome association analyses of sleep-disordered breathing phenotypes in the NHLBI TOPMed program

    Get PDF
    Background: Sleep-disordered breathing is a common disorder associated with significant morbidity. The genetic architecture of sleep-disordered breathing remains poorly understood. Through the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, we performed the first whole-genome sequence analysis of sleep-disordered breathing. Methods: The study sample was comprised of 7988 individuals of diverse ancestry. Common-variant and pathway analyses included an additional 13,257 individuals. We examined five complementary traits describing different aspects of sleep-disordered breathing: the apnea-hypopnea index, average oxyhemoglobin desaturation per event, average and minimum oxyhemoglobin saturation across the sleep episode, and the percentage of sleep with oxyhemoglobin saturation < 90%. We adjusted for age, sex, BMI, study, and family structure using MMSKAT and EMMAX mixed linear model approaches. Additional bioinformatics analyses were performed with MetaXcan, GIGSEA, and ReMap. Results: We identified a multi-ethnic set-based rare-variant association (p = 3.48 × 10−8) on chromosome X with ARMCX3. Additional rare-variant associations include ARMCX3-AS1, MRPS33, and C16orf90. Novel common-variant loci were identified in the NRG1 and SLC45A2 regions, and previously associated loci in the IL18RAP and ATP2B4 regions were associated with novel phenotypes. Transcription factor binding site enrichment identified associations with genes implicated with respiratory and craniofacial traits. Additional analyses identified significantly associated pathways. Conclusions: We have identified the first gene-based rare-variant associations with objectively measured sleep-disordered breathing traits. Our results increase the understanding of the genetic architecture of sleep-disordered breathing and highlight associations in genes that modulate lung development, inflammation, respiratory rhythmogenesis, and HIF1A-mediated hypoxic response

    Clonal hematopoiesis associated with epigenetic aging and clinical outcomes

    Get PDF
    Clonal hematopoiesis of indeterminate potential (CHIP) is a common precursor state for blood cancers that most frequently occurs due to mutations in the DNA-methylation modifying enzymes DNMT3A or TET2. We used DNA-methylation array and whole-genome sequencing data from four cohorts together comprising 5522 persons to study the association between CHIP, epigenetic clocks, and health outcomes. CHIP was strongly associated with epigenetic age acceleration, defined as the residual after regressing epigenetic clock age on chronological age, in several clocks, ranging from 1.31 years (GrimAge, p < 8.6 × 10−7) to 3.08 years (EEAA, p < 3.7 × 10−18). Mutations in most CHIP genes except DNA-damage response genes were associated with increases in several measures of age acceleration. CHIP carriers with mutations in multiple genes had the largest increases in age acceleration and decrease in estimated telomere length. Finally, we found that ~40% of CHIP carriers had acceleration >0 in both Hannum and GrimAge (referred to as AgeAccelHG+). This group was at high risk of all-cause mortality (hazard ratio 2.90, p < 4.1 × 10−8) and coronary heart disease (CHD) (hazard ratio 3.24, p < 9.3 × 10−6) compared to those who were CHIP−/AgeAccelHG−. In contrast, the other ~60% of CHIP carriers who were AgeAccelHG− were not at increased risk of these outcomes. In summary, CHIP is strongly linked to age acceleration in multiple clocks, and the combination of CHIP and epigenetic aging may be used to identify a population at high risk for adverse outcomes and who may be a target for clinical interventions

    Comparison of Proteomic Assessment Methods in Multiple Cohort Studies

    Get PDF
    Novel proteomics platforms, such as the aptamer-based SOMAscan platform, can quantify large numbers of proteins efficiently and cost-effectively and are rapidly growing in popularity. However, comparisons to conventional immunoassays remain underexplored, leaving investigators unsure when cross-assay comparisons are appropriate. The correlation of results from immunoassays with relative protein quantification is explored by SOMAscan. For 63 proteins assessed in two chronic obstructive pulmonary disease (COPD) cohorts, subpopulations and intermediate outcome measures in COPD Study (SPIROMICS), and COPDGene, using myriad rules based medicine multiplex immunoassays and SOMAscan, Spearman correlation coefficients range from −0.13 to 0.97, with a median correlation coefficient of ≈0.5 and consistent results across cohorts. A similar range is observed for immunoassays in the population-based Multi-Ethnic Study of Atherosclerosis and for other assays in COPDGene and SPIROMICS. Comparisons of relative quantification from the antibody-based Olink platform and SOMAscan in a small cohort of myocardial infarction patients also show a wide correlation range. Finally, cis pQTL data, mass spectrometry aptamer confirmation, and other publicly available data are integrated to assess relationships with observed correlations. Correlation between proteomics assays shows a wide range and should be carefully considered when comparing and meta-analyzing proteomics data across assays and studies

    Allelic Heterogeneity at the CRP Locus Identified by Whole-Genome Sequencing in Multi-ancestry Cohorts

    Get PDF
    Whole-genome sequencing (WGS) can improve assessment of low-frequency and rare variants, particularly in non-European populations that have been underrepresented in existing genomic studies. The genetic determinants of C-reactive protein (CRP), a biomarker of chronic inflammation, have been extensively studied, with existing genome-wide association studies (GWASs) conducted in >200,000 individuals of European ancestry. In order to discover novel loci associated with CRP levels, we examined a multi-ancestry population (n = 23,279) with WGS (∼38× coverage) from the Trans-Omics for Precision Medicine (TOPMed) program. We found evidence for eight distinct associations at the CRP locus, including two variants that have not been identified previously (rs11265259 and rs181704186), both of which are non-coding and more common in individuals of African ancestry (∼10% and ∼1% minor allele frequency, respectively, and rare or monomorphic in 1000 Genomes populations of East Asian, South Asian, and European ancestry). We show that the minor (G) allele of rs181704186 is associated with lower CRP levels and decreased transcriptional activity and protein binding in vitro, providing a plausible molecular mechanism for this African ancestry-specific signal. The individuals homozygous for rs181704186-G have a mean CRP level of 0.23 mg/L, in contrast to individuals heterozygous for rs181704186 with mean CRP of 2.97 mg/L and major allele homozygotes with mean CRP of 4.11 mg/L. This study demonstrates the utility of WGS in multi-ethnic populations to drive discovery of complex trait associations of large effect and to identify functional alleles in noncoding regulatory regions

    Whole-exome sequencing study identifies four novel gene loci associated with diabetic kidney disease

    Get PDF
    Diabetic kidney disease (DKD) is recognized as an important public health challenge. However, its genomic mechanisms are poorly understood. To identify rare variants for DKD, we conducted a whole-exome sequencing (WES) study leveraging large cohorts well-phenotyped for chronic kidney disease and diabetes. Our two-stage WES study included 4372 European and African ancestry participants from the Chronic Renal Insufficiency Cohort and Atherosclerosis Risk in Communities studies (stage 1) and 11 487 multi-ancestry Trans-Omics for Precision Medicine participants (stage 2). Generalized linear mixed models, which accounted for genetic relatedness and adjusted for age, sex and ancestry, were used to test associations between single variants and DKD. Gene-based aggregate rare variant analyses were conducted using an optimized sequence kernel association test implemented within our mixed model framework. We identified four novel exome-wide significant DKD-related loci through initiating diabetes. In single-variant analyses, participants carrying a rare, in-frame insertion in the DIS3L2 gene (rs141560952) exhibited a 193-fold increased odds [95% confidence interval (CI): 33.6, 1105] of DKD compared with noncarriers (P = 3.59 × 10-9). Likewise, each copy of a low-frequency KRT6B splice-site variant (rs425827) conferred a 5.31-fold higher odds (95% CI: 3.06, 9.21) of DKD (P = 2.72 × 10-9). Aggregate gene-based analyses further identified ERAP2 (P = 4.03 × 10-8) and NPEPPS (P = 1.51 × 10-7), which are both expressed in the kidney and implicated in renin-angiotensin-aldosterone system modulated immune response. In the largest WES study of DKD, we identified novel rare variant loci attaining exome-wide significance. These findings provide new insights into the molecular mechanisms underlying DKD
    • …
    corecore