336 research outputs found

    Principal-component-based population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of single-nucleotide polymorphism set and analysis method

    Get PDF
    Population structure occurs when a sample is composed of individuals with different ancestries and can result in excess type I error in genome-wide association studies. Genome-wide principal-component analysis (PCA) has become a popular method for identifying and adjusting for subtle population structure in association studies. Using the Genetic Analysis Workshop 16 (GAW16) NARAC data, we explore two unresolved issues concerning the use of genome-wide PCA to account for population structure in genetic associations studies: the choice of single-nucleotide polymorphism (SNP) subset and the choice of adjustment model. We computed PCs for subsets of genome-wide SNPs with varying levels of LD. The first two PCs were similar for all subsets and the first three PCs were associated with case status for all subsets. When the PCs associated with case status were included as covariates in an association model, the reduction in genomic inflation factor was similar for all SNP sets. Several models have been proposed to account for structure using PCs, but it is not yet clear whether the different methods will result in substantively different results for association studies with individuals of European descent. We compared genome-wide association p-values and results for two positive-control SNPs previously associated with rheumatoid arthritis using four PC adjustment methods as well as no adjustment and genomic control. We found that in this sample, adjusting for the continuous PCs or adjusting for discrete clusters identified using the PCs adequately accounts for the case-control population structure, but that a recently proposed randomization test performs poorly

    Screening large-scale association study data: exploiting interactions using random forests

    Get PDF
    BACKGROUND: Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for futher study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. RESULTS: Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. CONCLUSIONS: In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods

    Genetic correlates of longevity and selected age-related phenotypes: a genome-wide association study in the Framingham Study

    Get PDF
    BACKGROUND: Family studies and heritability estimates provide evidence for a genetic contribution to variation in the human life span. METHODS:We conducted a genome wide association study (Affymetrix 100K SNP GeneChip) for longevity-related traits in a community-based sample. We report on 5 longevity and aging traits in up to 1345 Framingham Study participants from 330 families. Multivariable-adjusted residuals were computed using appropriate models (Cox proportional hazards, logistic, or linear regression) and the residuals from these models were used to test for association with qualifying SNPs (70, 987 autosomal SNPs with genotypic call rate [greater than or equal to]80%, minor allele frequency [greater than or equal to]10%, Hardy-Weinberg test p [greater than or equal to] 0.001).RESULTS:In family-based association test (FBAT) models, 8 SNPs in two regions approximately 500 kb apart on chromosome 1 (physical positions 73,091,610 and 73, 527,652) were associated with age at death (p-value < 10-5). The two sets of SNPs were in high linkage disequilibrium (minimum r2 = 0.58). The top 30 SNPs for generalized estimating equation (GEE) tests of association with age at death included rs10507486 (p = 0.0001) and rs4943794 (p = 0.0002), SNPs intronic to FOXO1A, a gene implicated in lifespan extension in animal models. FBAT models identified 7 SNPs and GEE models identified 9 SNPs associated with both age at death and morbidity-free survival at age 65 including rs2374983 near PON1. In the analysis of selected candidate genes, SNP associations (FBAT or GEE p-value < 0.01) were identified for age at death in or near the following genes: FOXO1A, GAPDH, KL, LEPR, PON1, PSEN1, SOD2, and WRN. Top ranked SNP associations in the GEE model for age at natural menopause included rs6910534 (p = 0.00003) near FOXO3a and rs3751591 (p = 0.00006) in CYP19A1. Results of all longevity phenotype-genotype associations for all autosomal SNPs are web posted at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. CONCLUSION: Longevity and aging traits are associated with SNPs on the Affymetrix 100K GeneChip. None of the associations achieved genome-wide significance. These data generate hypotheses and serve as a resource for replication as more genes and biologic pathways are proposed as contributing to longevity and healthy aging

    Performance of random forest when SNPs are in linkage disequilibrium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF.</p> <p>Results</p> <p>We evaluated the performance of our alternative methods by simulation of a spectrum of complex genetics models. When a haplotype rather than an individual SNP is the risk factor, we find that the original Random Forest method performed on SNPs provides good performance. When individual, genotyped SNPs are the risk factors, we find that the stronger the genetic effect, the stronger the effect LD has on the performance of the original RF. A revised importance measure used with the original RF is relatively robust to LD among SNPs; this revised importance measure used with the revised RF is sometimes inflated. Overall, we find that the revised importance measure used with the original RF is the best choice when the genetic model and the number of SNPs in LD with risk SNPs are unknown. For the haplotype-based method, under a multiplicative heterogeneity model, we observed a decrease in the performance of RF with increasing LD among the SNPs in the haplotype.</p> <p>Conclusion</p> <p>Our results suggest that by strategically revising the Random Forest method tree-building or importance measure calculation, power can increase when LD exists between SNPs. We conclude that the revised Random Forest method performed on SNPs offers an advantage of not requiring genotype phase, making it a viable tool for use in the context of thousands of SNPs, such as candidate gene studies and follow-up of top candidates from genome wide association studies.</p

    Genome-Wide Association with Select Biomarker Traits in the Framingham Heart Study

    Get PDF
    BACKGROUND: Systemic biomarkers provide insights into disease pathogenesis, diagnosis, and risk stratification. Many systemic biomarker concentrations are heritable phenotypes. Genome-wide association studies (GWAS) provide mechanisms to investigate the genetic contributions to biomarker variability unconstrained by current knowledge of physiological relations. METHODS: We examined the association of Affymetrix 100K GeneChip single nucleotide polymorphisms (SNPs) to 22 systemic biomarker concentrations in 4 biological domains: inflammation/oxidative stress; natriuretic peptides; liver function; and vitamins. Related members of the Framingham Offspring cohort (n = 1012; mean age 59 ± 10 years, 51% women) had both phenotype and genotype data (minimum-maximum per phenotype n = 507–1008). We used Generalized Estimating Equations (GEE), Family Based Association Tests (FBAT) and variance components linkage to relate SNPs to multivariable-adjusted biomarker residuals. Autosomal SNPs (n = 70,987) meeting the following criteria were studied: minor allele frequency ≥ 10%, call rate ≥ 80% and Hardy-Weinberg equilibrium p ≥ 0.001. RESULTS: With GEE, 58 SNPs had p < 10-6: the top SNPs were rs2494250 (p = 1.00*10-14) and rs4128725 (p = 3.68*10-12) for monocyte chemoattractant protein-1 (MCP1), and rs2794520 (p = 2.83*10-8) and rs2808629 (p = 3.19*10-8) for C-reactive protein (CRP) averaged from 3 examinations (over about 20 years). With FBAT, 11 SNPs had p < 10-6: the top SNPs were the same for MCP1 (rs4128725, p = 3.28*10-8, and rs2494250, p = 3.55*10-8), and also included B-type natriuretic peptide (rs437021, p = 1.01*10-6) and Vitamin K percent undercarboxylated osteocalcin (rs2052028, p = 1.07*10-6). The peak LOD (logarithm of the odds) scores were for MCP1 (4.38, chromosome 1) and CRP (3.28, chromosome 1; previously described) concentrations; of note the 1.5 support interval included the MCP1 and CRP SNPs reported above (GEE model). Previous candidate SNP associations with circulating CRP concentrations were replicated at p < 0.05; the SNPs rs2794520 and rs2808629 are in linkage disequilibrium with previously reported SNPs. GEE, FBAT and linkage results are posted at . CONCLUSION: The Framingham GWAS represents a resource to describe potentially novel genetic influences on systemic biomarker variability. The newly described associations will need to be replicated in other studies.National Heart, Lung, and Blood Institute's Framingham Heart Study (N01-HC25195); National Institutes of Health National Center for Research Resources Shared Instrumentation grant (1S10RR163736-01A1); National Institutes of Health (HL064753, HL076784, AG028321, HL71039, 2 K24HL04334, 1K23 HL083102); Doris Duke Charitable Foundation; American Diabetes Association Career Developement Award; National Center for Research Resources (GCRC M01-RR01066); US Department of Agriculture Agricultural Research Service (58-1950-001, 58-1950-401); National Institute of Aging (AG14759

    Genome-wide association study of opioid cessation

    Get PDF
    The United States is experiencing an epidemic of opioid use disorder (OUD) and overdose-related deaths. However, the genetic basis for the ability to discontinue opioid use has not been investigated. We performed a genome-wide association study (GWAS) of opioid cessation (defined as abstinence from illicit opioids for \u3e1 year or \u3c6 months before the interview date) in 1130 African American (AA) and 2919 European ancestry (EA) participants recruited for genetic studies of substance use disorders and who met lifetime Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) criteria for OUD. Association tests performed separately within each ethnic group were combined by meta-analysis with results obtained from the Comorbidity and Trauma Study. Although there were no genome-wide significant associations, we found suggestive associations with nine independent loci, including three which are biologically relevant: rs4740988 i

    Comparison of On-Site Versus Remote Mobile Device Support in the Framingham Heart Study Using the Health eHeart Study for Digital Follow-up: Randomized Pilot Study Set Within an Observational Study Design

    Get PDF
    BACKGROUND: New electronic cohort (e-Cohort) study designs provide resource-effective methods for collecting participant data. It is unclear if implementing an e-Cohort study without direct, in-person participant contact can achieve successful participation rates. OBJECTIVE: The objective of this study was to compare 2 distinct enrollment methods for setting up mobile health (mHealth) devices and to assess the ongoing adherence to device use in an e-Cohort pilot study. METHODS: We coenrolled participants from the Framingham Heart Study (FHS) into the FHS-Health eHeart (HeH) pilot study, a digital cohort with infrastructure for collecting mHealth data. FHS participants who had an email address and smartphone were randomized to our FHS-HeH pilot study into 1 of 2 study arms: remote versus on-site support. We oversampled older adults (age \u3e /=65 years), with a target of enrolling 20% of our sample as older adults. In the remote arm, participants received an email containing a link to enrollment website and, upon enrollment, were sent 4 smartphone-connectable sensor devices. Participants in the on-site arm were invited to visit an in-person FHS facility and were provided in-person support for enrollment and connecting the devices. Device data were tracked for at least 5 months. RESULTS: Compared with the individuals who declined, individuals who consented to our pilot study (on-site, n=101; remote, n=93) were more likely to be women, highly educated, and younger. In the on-site arm, the connection and initial use of devices was \u3e /=20% higher than the remote arm (mean percent difference was 25% [95% CI 17-35] for activity monitor, 22% [95% CI 12-32] for blood pressure cuff, 20% [95% CI 10-30] for scale, and 43% [95% CI 30-55] for electrocardiogram), with device connection rates in the on-site arm of 99%, 95%, 95%, and 84%. Once connected, continued device use over the 5-month study period was similar between the study arms. CONCLUSIONS: Our pilot study demonstrated that the deployment of mobile devices among middle-aged and older adults in the context of an on-site clinic visit was associated with higher initial rates of device use as compared with offering only remote support. Once connected, the device use was similar in both groups

    Lifetime risk of atrial fibrillation according to optimal, borderline, or elevated levels of risk factors: cohort study based on longitudinal data from the Framingham Heart Study

    Get PDF
    OBJECTIVE: To examine the association between risk factor burdens-categorized as optimal, borderline, or elevated-and the lifetime risk of atrial fibrillation. DESIGN: Community based cohort study. SETTING: Longitudinal data from the Framingham Heart Study. PARTICIPANTS: Individuals free of atrial fibrillation at index ages 55, 65, and 75 years were assessed. Smoking, alcohol consumption, body mass index, blood pressure, diabetes, and history of heart failure or myocardial infarction were assessed as being optimal (that is, all risk factors were optimal), borderline (presence of borderline risk factors and absence of any elevated risk factor), or elevated (presence of at least one elevated risk factor) at index age. MAIN OUTCOME MEASURE: Lifetime risk of atrial fibrillation at index age up to 95 years, accounting for the competing risk of death. RESULTS: At index age 55 years, the study sample comprised 5338 participants (2531 (47.4%) men). In this group, 247 (4.6%) had an optimal risk profile, 1415 (26.5%) had a borderline risk profile, and 3676 (68.9%) an elevated risk profile. The prevalence of elevated risk factors increased gradually when the index ages rose. For index age of 55 years, the lifetime risk of atrial fibrillation was 37.0% (95% confidence interval 34.3% to 39.6%). The lifetime risk of atrial fibrillation was 23.4% (12.8% to 34.5%) with an optimal risk profile, 33.4% (27.9% to 38.9%) with a borderline risk profile, and 38.4% (35.5% to 41.4%) with an elevated risk profile. Overall, participants with at least one elevated risk factor were associated with at least 37.8% lifetime risk of atrial fibrillation. The gradient in lifetime risk across risk factor burden was similar at index ages 65 and 75 years. CONCLUSIONS: Regardless of index ages at 55, 65, or 75 years, an optimal risk factor profile was associated with a lifetime risk of atrial fibrillation of about one in five; this risk rose to more than one in three a third in individuals with at least one elevated risk factor
    • …
    corecore