65 research outputs found
Estimating relationships between phenotypes and subjects drawn from admixed families.
Background: Estimating relationships among subjects in a sample, within family structures or caused by population substructure, is complicated in admixed populations. Inaccurate allele frequencies can bias both kinship estimates and tests for association between subjects and a phenotype. We analyzed the simulated and real family data from Genetic Analysis Workshop 19, and were aware of the simulation model.
Results: We found that kinship estimation is more accurate when marker data include common variants whose frequencies are less variable across populations. Estimates of heritability and association vary with age for longitudinally measured traits. Accounting for local ancestry identified different true associations than those identified by a traditional approach. Principal components aid kinship estimation and tests for association, but their utility is influenced by the frequency of the markers used to generate them.
Conclusions: Admixed families can provide a powerful resource for detecting disease loci, as well as analytical challenges. Allele frequencies, although difficult to adequately estimate in admixed populations, have a strong impact on the estimation of kinship, ancestry, and association with phenotypes. Approaches that acknowledge population structure in admixed families outperform those which ignore it
Variant-specific inflation factors for assessing population stratification at the phenotypic variance level
In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term \u27variance stratification\u27. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We develop a procedure to compute variant-specific inflation factors, and show how it can be used for diagnosis of genetic association analyses on pooled individual level data from multiple studies. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We illustrate the variance stratification problem, its solutions, and the proposed diagnostic procedure, in simulations and in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI
Genome-wide association study of dental caries in the Hispanic Communities Health Study/Study of Latinos (HCHS/SOL)
Dental caries is the most common chronic disease worldwide, and exhibits profound disparities in the USA with racial and ethnic minorities experiencing disproportionate disease burden. Though heritable, the specific genes influencing risk of dental caries remain largely unknown. Therefore, we performed genome-wide association scans (GWASs) for dental caries in a population-based cohort of 12 000 Hispanic/Latino participants aged 18–74 years from the HCHS/SOL. Intra-oral examinations were used to generate two common indices of dental caries experience which were tested for association with 27.7 M genotyped or imputed single-nucleotide polymorphisms separately in the six ancestry groups. A mixed-models approach was used, which adjusted for age, sex, recruitment site, five principal components of ancestry and additional features of the sampling design. Meta-analyses were used to combine GWAS results across ancestry groups. Heritability estimates ranged from 20–53% in the six ancestry groups. The most significant association observed via meta-analysis for both phenotypes was in the region of the NAMPT gene (rs190395159; P-value = 6 × 10−10), which is involved in many biological processes including periodontal healing. Another significant association was observed for rs72626594 (P-value = 3 × 10−8) downstream of BMP7, a tooth development gene. Other associations were observed in genes lacking known or plausible roles in dental caries. In conclusion, this was the largest GWAS of dental caries, to date and was the first to target Hispanic/Latino populations. Understanding the factors influencing dental caries susceptibility may lead to improvements in prediction, prevention and disease management, which may ultimately reduce the disparities in oral health across racial, ethnic and socioeconomic strata
Genome-wide association study of iron traits and relation to diabetes in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL): potential genomic intersection of iron and glucose regulation?
Genetic variants contribute to normal variation of iron-related traits and may also cause clinical syndromes of iron deficiency or excess. Iron overload and deficiency can adversely affect human health. For example, elevated iron storage is associated with increased diabetes risk, although mechanisms are still being investigated. We conducted the first genome-wide association study of serum iron, total iron binding capacity (TIBC), transferrin saturation, and ferritin in a Hispanic/Latino cohort, the Hispanic Community Health Study/Study of Latinos (>12 000 participants) and also assessed the generalization of previously known loci to this population. We then evaluated whether iron-associated variants were associated with diabetes and glycemic traits. We found evidence for a novel association between TIBC and a variant near the gene for protein phosphatase 1, regulatory subunit 3B (PPP1R3B; rs4841132, β = -0.116, P = 7.44 × 10-8). The effect strengthened when iron deficient individuals were excluded (β = -0.121, P = 4.78 × 10-9). Ten of sixteen variants previously associated with iron traits generalized to HCHS/SOL, including variants at the transferrin (TF), hemochromatosis (HFE), fatty acid desaturase 2 (FADS2)/myelin regulatory factor (MYRF), transmembrane protease, serine 6 (TMPRSS6), transferrin receptor (TFR2), N-acetyltransferase 2 (arylamine N-acetyltransferase) (NAT2), ABO blood group (ABO), and GRB2 associated binding protein 3 (GAB3) loci. In examining iron variant associations with glucose homeostasis, an iron-raising variant of TMPRSS6 was associated with lower HbA1c levels (P = 8.66 × 10-10). This association was attenuated upon adjustment for iron measures. In contrast, the iron-raising allele of PPP1R3B was associated with higher levels of fasting glucose (P = 7.70 × 10-7) and fasting insulin (P = 4.79 × 10-6), but these associations were not attenuated upon adjustment for TIBC-so iron is not likely a mediator. These results provide new genetic information on iron traits and their connection with glucose homeostasis
Recommended from our members
Associations of variants In the hexokinase 1 and interleukin 18 receptor regions with oxyhemoglobin saturation during sleep
Sleep disordered breathing (SDB)-related overnight hypoxemia is associated with cardiometabolic disease and other comorbidities. Understanding the genetic bases for variations in nocturnal hypoxemia may help understand mechanisms influencing oxygenation and SDB-related mortality. We conducted genome-wide association tests across 10 cohorts and 4 populations to identify genetic variants associated with three correlated measures of overnight oxyhemoglobin saturation: average and minimum oxyhemoglobin saturation during sleep and the percent of sleep with oxyhemoglobin saturation under 90%. The discovery sample consisted of 8,326 individuals. Variants with p −6 were analyzed in a replication group of 14,410 individuals. We identified 3 significantly associated regions, including 2 regions in multi-ethnic analyses (2q12, 10q22). SNPs in the 2q12 region associated with minimum SpO2 (rs78136548 p = 2.70 × 10−10). SNPs at 10q22 were associated with all three traits including average SpO2 (rs72805692 p = 4.58 × 10−8). SNPs in both regions were associated in over 20,000 individuals and are supported by prior associations or functional evidence. Four additional significant regions were detected in secondary sex-stratified and combined discovery and replication analyses, including a region overlapping Reelin, a known marker of respiratory complex neurons.These are the first genome-wide significant findings reported for oxyhemoglobin saturation during sleep, a phenotype of high clinical interest. Our replicated associations with HK1 and IL18R1 suggest that variants in inflammatory pathways, such as the biologically-plausible NLRP3 inflammasome, may contribute to nocturnal hypoxemia
Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific Loci in Hispanic/Latino Americans
Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10−28) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits
Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos
US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a “genetic-analysis group” variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness
Mosaic Chromosomal alterations in Blood across ancestries Using Whole-Genome Sequencing
Megabase-scale mosaic chromosomal alterations (mCAs) in blood are prognostic markers for a host of human diseases. Here, to gain a better understanding of mCA rates in genetically diverse populations, we analyzed whole-genome sequencing data from 67,390 individuals from the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine program. We observed higher sensitivity with whole-genome sequencing data, compared with array-based data, in uncovering mCAs at low mutant cell fractions and found that individuals of European ancestry have the highest rates of autosomal mCAs and the lowest rates of chromosome X mCAs, compared with individuals of African or Hispanic ancestry. Although further studies in diverse populations will be needed to replicate our findings, we report three loci associated with loss of chromosome X, associations between autosomal mCAs and rare variants in DCPS, ADM17, PPP1R16B and TET2 and ancestry-specific variants in ATM and MPL with mCAs in cis
Recommended from our members
Detectable Clonal Mosaicism from Birth to Old Age and its Relationship to Cancer
Clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) was detected using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells (>5–10%) with the same abnormal karyotype (presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rises rapidly to 2–3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions that pinpoint the locations of genes previously associated with hematological cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer prior to DNA sampling, those without a prior diagnosis have an estimated 10-fold higher risk of a subsequent hematological cancer (95% confidence interval = 6–18)
GWAS of the electrocardiographic QT interval in Hispanics/Latinos generalizes previously identified loci and identifies population-specific signals
QT interval prolongation is a heritable risk factor for ventricular arrhythmias and can predispose to sudden death. Most genome-wide association studies (GWAS) of QT were performed in European ancestral populations, leaving other groups uncharacterized. Herein we present the first QT GWAS of Hispanic/Latinos using data on 15,997 participants from four studies. Study-specific summary results of the association between 1000 Genomes Project (1000G) imputed SNPs and electrocardiographically measured QT were combined using fixed-effects meta-analysis. We identified 41 genome-wide significant SNPs that mapped to 13 previously identified QT loci. Conditional analyses distinguished six secondary signals at NOS1AP (n = 2), ATP1B1 (n = 2), SCN5A (n = 1), and KCNQ1 (n = 1). Comparison of linkage disequilibrium patterns between the 13 lead SNPs and six secondary signals with previously reported index SNPs in 1000G super populations suggested that the SCN5A and KCNE1 lead SNPs were potentially novel and population-specific. Finally, of the 42 suggestively associated loci, AJAP1 was suggestively associated with QT in a prior East Asian GWAS; in contrast BVES and CAP2 murine knockouts caused cardiac conduction defects. Our results indicate that whereas the same loci influence QT across populations, population-specific variation exists, motivating future trans-ethnic and ancestrally diverse QT GWAS
- …