5 research outputs found

    Genotype determination for polymorphisms in linkage disequilibrium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies with single nucleotide polymorphisms (SNPs) show great promise to identify genetic determinants of complex human traits. In current analyses, genotype calling and imputation of missing genotypes are usually considered as two separated tasks. The genotypes of SNPs are first determined one at a time from allele signal intensities. Then the missing genotypes, i.e., no-calls caused by not perfectly separated signal clouds, are imputed based on the linkage disequilibrium (LD) between multiple SNPs. Although many statistical methods have been developed to improve either genotype calling or imputation of missing genotypes, treating the two steps independently can lead to loss of genetic information.</p> <p>Results</p> <p>We propose a novel genotype calling framework. In this framework, we consider the signal intensities and underlying LD structure of SNPs simultaneously by estimating both cluster parameters and haplotype frequencies. As a result, our new method outperforms some existing algorithms in terms of both call rates and genotyping accuracy. Our studies also suggest that jointly analyzing multiple SNPs in LD provides more accurate estimation of haplotypes than haplotype reconstruction methods that only use called genotypes.</p> <p>Conclusion</p> <p>Our study demonstrates that jointly analyzing signal intensities and LD structure of multiple SNPs is a better way to determine genotypes and estimate LD parameters.</p

    Shape-IT: new rapid and accurate algorithm for haplotype inference

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees.</p> <p>Results</p> <p>Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets.</p> <p>Conclusion</p> <p>Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM.</p

    Analysis of association studies and inference of haplotypic phase using hidden Markov models

    No full text
    In this thesis I focus on the development and application of hidden Markov model (HMM) to solve problems in statistical genetics. Our method, based on a HMM, models the joint haplotype structure in the samples, where the parameters in the model are estimated by the Baum-Welch EM algorithm. Also, the model does not require pre-defined blocks or a sliding window scheme to define haplotype boundaries. Thus our method is computationally efficient and applicable for either the whole genome sequence or the candidate gene sequence. The first application of this model is for disease association testing using inferred ancestral haplotypes. We employed a HMM to cluster haplotypes into groups of predicted common ancestral haplotypes from diploid genotypes. The results from simulation studies show that our method greatly outperforms single-SNP analyses and has greater power than a haplotype-based method, CLADHC, in most simulation scenarios. The second application is for inferring haplotypic phase and to predict missing genotypes in polyploid organisms. Using a simulation study we demonstrate that the method provides accurate estimates of haplotypic phase and missing genotypes for diploids, triploids and tetraploids. The third application is for joint CNV/SNP haplotype and missing data inference. The results are very encouraging for this application. With the increasing availability of genotype data in both diploid and polyploid organisms, we believe that our programs can facilitate the investigation of genetic variations in genome-wide scale studies
    corecore