2 research outputs found

    TAGSNP SELECTION BASED ON PAIRWISE LD CRITERIA AND POWER ANALYSIS IN ASSOCIATION STUDIES

    No full text
    TagSNP selection is an important step in designing case control association studies. Among selection methods that have proliferated, the ones based on pairwise LD measurement are attractive for the purpose of designing association studies. The goal is to minimize the number of markers selected for genotyping in a particular platform and therefore reduce genotyping cost while simultaneously representing information provided by all other markers. Depending on the platform, it is also important to select sets that are robust against occasional genotyping failure. An array of methods has been proposed to effectively select these tagSNPs using various criteria. In this study, we extend the algorithms used in FESTA, a computer program we previously developed for picking tagSNPs using r 2 criteria. We applied FESTA to the HapMap whole chromosome data in two different populations, and we also performed a power analysis for case-control association studies using simulated data. FESTA chooses 294322 tagSNPs in the autosomes in the CEPH samples. The YORUBA samples require 61.5 % more tagSNPs than the CEPH samples. The power study showed that limiting ourselves to only tagSNPs, instead of choosing all SNPs in the interval for an association study, results in a power loss of only about 5-10%. 1

    The Road to Identifying Disease Causing Genes: Association Tests, Genotype Imputations, and Sampling Strategies for Sequencing Studies.

    Full text link
    Technological advances now allow investigators to use sequencing data to identify genetic risk variants for complex diseases. However, it is still expensive to sequence a large sample of individuals. While genotype imputation can augment sequence studies, challenges still remain, such as imputation with population or family structures and imputation of rare variants. This dissertation aims to tackle these two challenges. The first project considers imputation with family structures, which extended from an existing imputation program that assumes unrelated individuals in a sample. I propose a strategy for imputing data with family structures and apply it to a family-based association study for bipolar disorder. The results suggest the involvement of ion channelopathy in bipolar pathogenesis. The second and third projects provide sampling strategies for next-generation sequencing. The goal is to select a subset from a study sample that incorporates maximal number of variants when sequenced, or to achieve maximal imputation accuracy when impute the sequences of the rest study sample using the sequenced subset or both. In the second project, I propose the “most diverse panel” by adapting the concept of the phylogenetic diversity. This strategy assumes that the panel with the biggest overall tree length in the phylogenetic tree represents the longest evolutionary time, allowing the maximal number of mutation events to occur. Sequencing such a panel can thus identify the maximal number of variants. In the third project I propose the “most representative panel” by considering both the selected and unselected haplotypes. The goal is to identify at least one optimal selected reference haplotype for each unselected haplotype. Because it is computationally impossible to perform an exhaustive search for a large sample size, I develop a hill-climbing algorithm that updates a randomly selected panel a predefined number of iterations or until it converges. Using simulated sequence data and real sequence data from the 1000 Genomes Project, I compare the two proposed panels to randomly selected panels and provide suggestions on which algorithm to use when planning sequencing studies with specific study samples.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99798/1/penzhang_1.pd
    corecore