29 research outputs found

    Identification of polymorphisms explaining a linkage signal: application to the GAW14 simulated data

    Get PDF
    We applied three approaches for the identification of polymorphisms explaining the linkage evidence to the Genetic Analysis Workshop 14 simulated data: 1) the genotype-IBD sharing test (GIST); 2) an approach suggested by Horikawa and colleagues; and 3) the homozygote sharing test (HST). These tests were compared with a family-based association test. Two linked regions with highest nonparametric linkage scores were selected to apply these methods. In the first region, Horikawa's method identified the most SNPs within the region containing the disease susceptibility locus, while HST performed best in the second region. However, Horikawa's method also had the most type I errors. These methods show potential as additional tools to complement family-based association tests for the identification of disease susceptibility variants

    Screening large-scale association study data: exploiting interactions using random forests

    Get PDF
    BACKGROUND: Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for futher study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction. RESULTS: Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact. CONCLUSIONS: In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods

    Mapping complex traits using Random Forests

    Get PDF
    Random Forest is a prediction technique based on growing trees on bootstrap samples of data, in conjunction with a random selection of explanatory variables to define the best split at each node. In the case of a quantitative outcome, the tree predictor takes on a numerical value. We applied Random Forest to the first replicate of the Genetic Analysis Workshop 13 simulated data set, with the sibling pairs as our units of analysis and identity by descent (IBD) at selected loci as our explanatory variables. With the knowledge of the true model, we performed two sets of analyses on three phenotypes: HDL, triglycerides, and glucose. The goal was to approach the mapping of complex traits from a multivariate perspective. The first set of analyses mimics a candidate gene approach with a high proportion of true genes among the predictors while the second set represents a genome scan analysis using microsatellite markers. Random Forest was able to identify a few of the major genes influencing the phenotypes, such as baseline HDL and triglycerides, but failed to identify the major genes regulating baseline glucose levels

    Joint modeling of linkage and association using affected sib-pair data

    Get PDF
    There has been a growing interest in developing strategies for identifying single-nucleotide polymorphisms (SNPs) that explain a linkage signal by joint modeling of linkage and association. We compare several existing methods and propose a new method called the homozygote sharing transmission-disequilibrium test (HSTDT) to detect linkage and association or to identify SNPs explaining the linkage signal on chromosome 6 for rheumatoid arthritis using 100 replicates of the Genetic Analysis Workshop (GAW) 15 simulated affected sib-pair data. Existing methods considered included the family-based tests of association implemented in FBAT, a transmission-disequilibrium test, a conditional logistic regression approach, a likelihood-based approach implemented in LAMP, and the homozygote sharing test (HST). We compared the type I error rates and power for tests classified into three categories according to their null hypotheses: 1) no association in the presence of linkage (i.e., a SNP explains none of the linkage evidence), 2) no linkage adjusting for the association (i.e., a SNP explains all linkage evidence), and 3) no linkage and no association. For testing association in the presence of linkage, we found similar power among all tests except for the homozygote sharing test that had lower power. When testing linkage adjusting for association, similar power was observed between LAMP and HST, but lower power for the conditional logistic regression method. When testing linkage or association, the conditional logistic regression method was more powerful than FBAT

    Mechanistic role of a disease-associated genetic variant within the ADAM33 asthma susceptibility gene

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>ADAM33 has been identified as an asthma-associated gene in an out-bred population. Genetic studies suggested that the functional role of this metalloprotease was in airway remodeling. However, the mechanistic roles of the disease-associated SNPs have yet to be elucidated especially in the context of the pathophysiology of asthma. One disease-associated SNP, BC+1, which resides in intron BC toward the 5' end of ADAM33, is highly associated with the disease.</p> <p>Methods</p> <p>The region surrounding this genetic variant was cloned into a model system to determine if there is a regulatory element within this intron that influences transcription.</p> <p>Results</p> <p>The BC+1 protective allele did not impose any affect on the transcription of the reporter gene. However, the at-risk allele enforced such a repressive affect on the promoter that no protein product from the reporter gene was detected. These results indicated that there exists within intron BC a regulatory element that acts as a repressor for gene expression. Moreover, since SNP BC+1 is a common genetic variant, this region may interact with other undefined regulatory elements within ADAM33 to provide a rheostat effect, which modulates pre-mRNA processing. Thus, SNP BC+1 may have an important role in the modulation of ADAM33 gene expression.</p> <p>Conclusion</p> <p>These data provide for the first time a functional role for a disease-associated SNP in ADAM33 and begin to shed light on the deregulation of this gene in the pathophysiology of asthma.</p

    Meta-Analysis for Genome-Wide Association Study Identifies Multiple Variants at the BIN1 Locus Associated with Late-Onset Alzheimer's Disease

    Get PDF
    Recent GWAS studies focused on uncovering novel genetic loci related to AD have revealed associations with variants near CLU, CR1, PICALM and BIN1. In this study, we conducted a genome-wide association study in an independent set of 1034 cases and 1186 controls using the Illumina genotyping platforms. By coupling our data with available GWAS datasets from the ADNI and GenADA, we replicated the original associations in both PICALM (rs3851179) and CR1 (rs3818361). The PICALM variant seems to be non-significant after we adjusted for APOE e4 status. We further tested our top markers in 751 independent cases and 751 matched controls. Besides the markers close to the APOE locus, a marker (rs12989701) upstream of BIN1 locus was replicated and the combined analysis reached genome-wide significance level (p = 5E-08). We combined our data with the published Harold et al. study and meta-analysis with all available 6521 cases and 10360 controls at the BIN1 locus revealed two significant variants (rs12989701, p = 1.32E-10 and rs744373, p = 3.16E-10) in limited linkage disequilibrium (r2 = 0.05) with each other. The independent contribution of both SNPs was supported by haplotype conditional analysis. We also conducted multivariate analysis in canonical pathways and identified a consistent signal in the downstream pathways targeted by Gleevec (P = 0.004 in Pfizer; P = 0.028 in ADNI and P = 0.04 in GenADA). We further tested variants in CLU, PICALM, BIN1 and CR1 for association with disease progression in 597 AD patients where longitudinal cognitive measures are sufficient. Both the PICALM and CLU variants showed nominal significant association with cognitive decline as measured by change in Clinical Dementia Rating-sum of boxes (CDR-SB) score from the baseline but did not pass multiple-test correction. Future experiments will help us better understand potential roles of these genetic loci in AD pathology

    Joint Multipoint Linkage Analysis of Multivariate Qualitative and Quantitative Traits. I. Likelihood Formulation and Simulation Results

    Get PDF
    We describe a variance-components method for multipoint linkage analysis that allows joint consideration of a discrete trait and a correlated continuous biological marker (e.g., a disease precursor or associated risk factor) in pedigrees of arbitrary size and complexity. The continuous trait is assumed to be multivariate normally distributed within pedigrees, and the discrete trait is modeled by a threshold process acting on an underlying multivariate normal liability distribution. The liability is allowed to be correlated with the quantitative trait, and the liability and quantitative phenotype may each include covariate effects. Bivariate discrete-continuous observations will be common, but the method easily accommodates qualitative and quantitative phenotypes that are themselves multivariate. Formal likelihood-based tests are described for coincident linkage (i.e., linkage of the traits to distinct quantitative-trait loci [QTLs] that happen to be linked) and pleiotropy (i.e., the same QTL influences both discrete-trait status and the correlated continuous phenotype). The properties of the method are demonstrated by use of simulated data from Genetic Analysis Workshop 10. In a companion paper, the method is applied to data from the Collaborative Study on the Genetics of Alcoholism, in a bivariate linkage analysis of alcoholism diagnoses and P300 amplitude of event-related brain potentials
    corecore