8 research outputs found

    Haplotype sharing analysis with SNPs in candidate genes: The genetic analysis workshop 12 example

    No full text
    Haplotype sharing analysis was used to investigate the association of affection status with single nucleotide polymorphism (SNP) haplotypes within candidate gene 1 in one sample each from the isolated and the general population of Genetic Analysis Workshop (GAW) 12 simulated data. Gene 1 has direct influence on affection and harbors more than 70 SNPs. Haplotype sharing analysis depends heavily on previous haplotype estimation. Using GENEHUNTER haplotypes, strong evidence was found for most SNPs in the isolated population sample, thus providing evidence for an involvement of this gene, but the maximum -log(10)(p) values for the haplotype sharing statistics (HSS) test statistic did not correspond to the location of the true variant in either population. In comparison, transmission disequilibrium test (TDT) analysis showed the strongest results at the disease-causing variant in both populations, and these were outstanding in the general population. In this example, TDT analysis appears to perform better than HSS in identifying the disease-causing variant, using SNPs within a candidate gene in an outbred population. Simulations showed that the performance of HSS is hampered by closely spaced SNPs in strong linkage disequilibrium with the functional variant and by ambiguous haplotypes

    Data mining, neural nets, trees–problems 2 and 3 of Genetic Analysis Workshop 15

    No full text
    Genome-wide association studies using thousands to hundreds of thousands of single nucleotide polymorphism (SNP) markers and region-wide association studies using a dense panel of SNPs are already in use to identify disease susceptibility genes and to predict disease risk in individuals. Because these tasks become increasingly important, three different data sets were provided for the Genetic Analysis Workshop 15, thus allowing examination of various novel and existing data mining methods for both classification and identification of disease susceptibility genes, gene by gene or gene by environment interaction. The approach most often applied in this presentation group was random forests because of its simplicity, elegance, and robustness. It was used for prediction and for screening for interesting SNPs in a first step. The logistic tree with unbiased selection approach appeared to be an interesting alternative to efficiently select interesting SNPs. Machine learning, specifically ensemble methods, might be useful as pre-screening tools for large-scale association studies because they can be less prone to overfitting, can be less computer processor time intensive, can easily include pair-wise and higher-order interactions compared with standard statistical approaches and can also have a high capability for classification. However, improved implementations that are able to deal with hundreds of thousands of SNPs at a time are required
    corecore