43,733 research outputs found
Search for Risk Haplotype Segments with GWAS Data by Use of Finite Mixture Models
The region-based association analysis has been proposed to capture the
collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically
involves a list of unphased multiple-locus genotypes with
potentially sparse frequencies in cases and controls.
To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype co-classification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this
haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of
false haplotypes which hamper the detection of rare but true
haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the
risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the
previous stage.
To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization.
The performance of the proposed procedure is assessed by
simulation studies and a real data analysis. Compared to the existing
multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure
Screening tests for Disease Risk Haplotype Segments in Genome by Use of Permutation
The haplotype association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. It starts with inferring haplotypes from genotypes followed by a haplotype co-classification and marginal screening for disease-associated haplotypes. Unfortunately, phasing uncertainty may have a strong effects on the haplotype co-classification and therefore on the accuracy of predicting risk haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we select potential risk genotypes instead of co-classification of the inferred haplotypes. In Stage 2, we infer risk haplotypes from the genotypes inferred from the previous stage. The performance of the proposed procedure is assessed by simulation studies and a real data analysis. Compared to the existing multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure
- …