14 research outputs found

    Hemoglobin and oxygen saturation measurements.

    No full text
    <p>Box plots describe variation in the Amhara 05 (dark grey boxes), Amhara 95 (grey boxes) and Oromo (white boxes) for Hb concentration (g/dL) among males (A) and females (B) and for O<sub>2</sub> sat also among males (E) and females (F). Box plots show the median (horizontal line), interquartile range (box), and range (whiskers), except the extreme values represented by circles. Statistically significant differences after multiple test correction between groups (unpaired two-sided two-sample t-test) are bolded in C, D, G, and H.</p

    Hemoglobin association test within Amhara.

    No full text
    <p>The QQplot represents the excess of strong association with Hb among Amhara individuals (A). The observed −log10 p-value distribution is ranked from smallest to largest and plotted (y-axis) against the expected −log10 p-value (y-axis) in black. The grey area indicates the 95% confidence interval (see methods). Genome-wide (GW) significance level (after multiple test correction) is indicated by the dashed line. The Manhattan plot (B) shows the GW significance achieved by a set of high-LD SNPs in chromosome 1. The box plots describe the correlation between hemoglobin levels and the 3 genotypes of the top and GW significant SNP (rs10803083) among high (C) and low altitude (D) Amhara.</p

    Power plots.

    No full text
    <p>The effect of <i>β</i> and MAF on the power of association tests based on the Ethiopian sample size (corrected for the number of SNPs tested within 10 kb from gene) is illustrated for <i>EPAS1</i> (A, 72 SNPs)), <i>EGLN1</i> (B, 38 SNPs) and any gene within the Response to Hypoxia gene ontology category (C, 1309 SNPs).</p

    Association test of Tibetan <i>EGLN1</i> and <i>EPAS1</i> SNPs within Amhara, Oromo, and combined Ethiopians.

    No full text
    1<p>Genotype-phenotype association β coefficients for <i>EGLN1</i> were obtained from Simonson <i>et al</i><a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003110#pgen.1003110-Simonson1" target="_blank">[16]</a> while those for <i>EPAS1</i> were obtained from Beall <i>et al</i><a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003110#pgen.1003110-Beall5" target="_blank">[17]</a>.</p>2<p>β indicates the observed linear coefficient for the relationship between SNP genotype and Hb levels.</p>3<p>Power refers to the probability of detecting a significant association (p<0.05) between SNP genotype and Hb level given the MAF and the sample size in the Ethiopian populations assuming that the β coefficient is as high or higher as that observed in Tibetans.</p

    PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

    No full text
    <div><p>Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (<u>P</u>edig<u>R</u>ee <u>IM</u>putation <u>AL</u>gorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.</p></div

    Parental origin assignment process.

    No full text
    <p>For a given quasi-founder, we denote his/her haplotypes by A and B, and (by convention) the first is paternal and the second is maternal. At each SNV, we calculate a 2×2 matrix of kinships (Step 1) between each of the proband’s parents and each subject in the A and B IBD cliques. Using these, we generate a parental haplotype separation measure m (Step 2). If m≈1, A and B are already correctly ordered; if m≈-1, they should be swapped. If the majority of the SNVs agree on the same swapping (indicated by a sample separation M sufficiently close to 1 in Step 3), we assign paternal origin and reorder A and B accordingly (Step 4).</p

    Imputation performance.

    No full text
    <p>PRIMAL and PRIMAL combined with LD-based imputation (PRIMAL+LD) performance and calls rates for the 1,317 Hutterites whose genomes were not sequenced. The concordance and het concordance figures marked by asterisks were based on the concordance of the PRIMAL and LD-based imputation on the set of genotypes called by both. Cross-validation SNVs were chosen from the framework SNVs as explained in the text.</p><p>Imputation performance.</p

    Partitioning an IBD-sharing graph into cliques.

    No full text
    <p>(1) IBD segments are indexed into a graph at each SNV. Nodes represent haplotypes (denoted A-H). Each pair of haplotypes that share an IBD segment at the SNV is connected with a link whose weight equals the HMM posterior probability. (2) Link weights are replaced by affinities. Links with small original weight or affinity are removed (3); all nodes within each of the resulting connected components are connected (4).</p

    The imputation pipeline.

    No full text
    <p>Given a pedigree tree of 3,671 Hutterites (1), 1,415 individuals in the three most recent generations (within the red box) were genotyped with framework markers (2). The first part of the pipeline (steps 2–6) depends only on the framework marker data; the second part (steps 7–9) imputes the whole genome sequence variants. First, estimates of identity coefficients and the transition rate parameter λ [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004139#pcbi.1004139.ref024" target="_blank">24</a>] between each pair of the 1,415 individuals are calculated (3). The framework genotypes are then phased (4), IBD segments between haplotypes are identified using a HMM (5), and indexed into an efficient data structure consisting of IBD cliques (6). Haplotypes are assigned parental origins consistent across the pedigree using the cliques (7). Then, the whole genome sequences of 98 Hutterites (8) are cleaned using several filters, including a novel generalized Mendelian error check (9), and imputed to the remaining 1,317 Hutterites using IBD cliques (10). Call rates are boosted by imputing as many of remaining genotypes as possible using an LD-based imputation method, IMPUTE2 (11). To ensure that accuracy is not compromised, we calculate the concordance of the shared genotypes between the two methods and keep only variants that are highly concordant (12).</p
    corecore