21 research outputs found

    PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

    No full text
    <div><p>Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (<u>P</u>edig<u>R</u>ee <u>IM</u>putation <u>AL</u>gorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.</p></div

    Population genetic parameters of the <i>CFTR</i> Met470Val locus.

    No full text
    <p>(A) Haplotype blocks +/− 500 kb around Met470Val locus in HapMap CEU (phase II) samples. The arrow indicates the location of Met470Val; the blue vertical line shows the ancestral Met470 allele and the red vertical line shows the derived Val470 allele. A continuous block of the same color represents the haplotypes shared between individuals. Haplotypes on the Met470 background are shorter and more variable compared to those on Val470 background. (B) Decay of extended haplotype homozygosity (EHH) around the Met470Val locus in the same data as in (A). The blue plot represents the decay of haplotypes on the ancestral (Met) allele background; the red plot represents the decay of haplotypes on the derived (Val) allele background. The Y-axis shows the EHH, defined as the probability that two randomly chosen chromosomes are homozygous at all SNPs for the entire interval from the core SNP at distance <i>x </i><a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000974#pgen.1000974-Sabeti2" target="_blank">[37]</a>. EHH probability drops below 0.5 at approximately 300 kb around Met470Val on haplotypes carrying the Val470 allele, compared to <20 kb on haplotypes carrying the Met allele. The iHS corresponds to the natural logarithm of the ratio of areas under the ancestral and derived allele EHH curves, standardized to be independent of the allele frequencies. A negative iHS implies that the haplotypes on derived allele background are longer than those on ancestral background <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000974#pgen.1000974-Voight1" target="_blank">[23]</a>. (C–E) Genome-wide distributions of (C) Fst values, (D) iHS scores and (E) Fst and iHS scores for SNPs in HapMap phase II data. Black lines (and filled circle in E) show the location of the Met470Val SNP in each distribution. Proportions of SNPs with more extreme values are shown on the plots as empirical <i>P</i>-values.</p

    Results of association tests with birth rate in Hutterite men.

    No full text
    1<p>All three genotypes were tested individually.</p>2<p>People carrying Met/Val and Val/Val genotypes were combined and tested against Met/Met homozygotes.</p

    Partitioning an IBD-sharing graph into cliques.

    No full text
    <p>(1) IBD segments are indexed into a graph at each SNV. Nodes represent haplotypes (denoted A-H). Each pair of haplotypes that share an IBD segment at the SNV is connected with a link whose weight equals the HMM posterior probability. (2) Link weights are replaced by affinities. Links with small original weight or affinity are removed (3); all nodes within each of the resulting connected components are connected (4).</p

    Parental origin assignment process.

    No full text
    <p>For a given quasi-founder, we denote his/her haplotypes by A and B, and (by convention) the first is paternal and the second is maternal. At each SNV, we calculate a 2×2 matrix of kinships (Step 1) between each of the proband’s parents and each subject in the A and B IBD cliques. Using these, we generate a parental haplotype separation measure m (Step 2). If m≈1, A and B are already correctly ordered; if m≈-1, they should be swapped. If the majority of the SNVs agree on the same swapping (indicated by a sample separation M sufficiently close to 1 in Step 3), we assign paternal origin and reorder A and B accordingly (Step 4).</p

    Geographic distribution of the Met470Val polymorphism in HGDP samples.

    No full text
    <p>The relative frequencies of each allele are shown as blue (ancestral Met470 allele) and orange (derived Val470 allele) pie slices.</p

    The imputation pipeline.

    No full text
    <p>Given a pedigree tree of 3,671 Hutterites (1), 1,415 individuals in the three most recent generations (within the red box) were genotyped with framework markers (2). The first part of the pipeline (steps 2–6) depends only on the framework marker data; the second part (steps 7–9) imputes the whole genome sequence variants. First, estimates of identity coefficients and the transition rate parameter λ [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004139#pcbi.1004139.ref024" target="_blank">24</a>] between each pair of the 1,415 individuals are calculated (3). The framework genotypes are then phased (4), IBD segments between haplotypes are identified using a HMM (5), and indexed into an efficient data structure consisting of IBD cliques (6). Haplotypes are assigned parental origins consistent across the pedigree using the cliques (7). Then, the whole genome sequences of 98 Hutterites (8) are cleaned using several filters, including a novel generalized Mendelian error check (9), and imputed to the remaining 1,317 Hutterites using IBD cliques (10). Call rates are boosted by imputing as many of remaining genotypes as possible using an LD-based imputation method, IMPUTE2 (11). To ensure that accuracy is not compromised, we calculate the concordance of the shared genotypes between the two methods and keep only variants that are highly concordant (12).</p
    corecore