Search CORE

14 research outputs found

Hemoglobin and oxygen saturation measurements.

Author: Amha Gebremedhin (113909)
Anna Di Rienzo (43550)
Cynthia M. Beall (113906)
David B. Witonsky (113907)
Gorka Alkorta-Aranburu (113905)
Jonathan K. Pritchard (113911)
Publication venue
Publication date
Field of study

Box plots describe variation in the Amhara 05 (dark grey boxes), Amhara 95 (grey boxes) and Oromo (white boxes) for Hb concentration (g/dL) among males (A) and females (B) and for O2 sat also among males (E) and females (F). Box plots show the median (horizontal line), interquartile range (box), and range (whiskers), except the extreme values represented by circles. Statistically significant differences after multiple test correction between groups (unpaired two-sided two-sample t-test) are bolded in C, D, G, and H.</p

FigShare

Hemoglobin association test within Amhara.

Author: Amha Gebremedhin (113909)
Anna Di Rienzo (43550)
Cynthia M. Beall (113906)
David B. Witonsky (113907)
Gorka Alkorta-Aranburu (113905)
Jonathan K. Pritchard (113911)
Publication venue
Publication date
Field of study

The QQplot represents the excess of strong association with Hb among Amhara individuals (A). The observed −log10 p-value distribution is ranked from smallest to largest and plotted (y-axis) against the expected −log10 p-value (y-axis) in black. The grey area indicates the 95% confidence interval (see methods). Genome-wide (GW) significance level (after multiple test correction) is indicated by the dashed line. The Manhattan plot (B) shows the GW significance achieved by a set of high-LD SNPs in chromosome 1. The box plots describe the correlation between hemoglobin levels and the 3 genotypes of the top and GW significant SNP (rs10803083) among high (C) and low altitude (D) Amhara.</p

FigShare

Power plots.

Author: Amha Gebremedhin (113909)
Anna Di Rienzo (43550)
Cynthia M. Beall (113906)
David B. Witonsky (113907)
Gorka Alkorta-Aranburu (113905)
Jonathan K. Pritchard (113911)
Publication venue
Publication date
Field of study

The effect of β and MAF on the power of association tests based on the Ethiopian sample size (corrected for the number of SNPs tested within 10 kb from gene) is illustrated for EPAS1 (A, 72 SNPs)), EGLN1 (B, 38 SNPs) and any gene within the Response to Hypoxia gene ontology category (C, 1309 SNPs).</p

FigShare

Association test of Tibetan EGLN1 and EPAS1 SNPs within Amhara, Oromo, and combined Ethiopians.

Author: Amha Gebremedhin (113909)
Anna Di Rienzo (43550)
Cynthia M. Beall (113906)
David B. Witonsky (113907)
Gorka Alkorta-Aranburu (113905)
Jonathan K. Pritchard (113911)
Publication venue
Publication date
Field of study

1Genotype-phenotype association β coefficients for EGLN1 were obtained from Simonson et al<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003110#pgen.1003110-Simonson1" target="_blank">[16]</a> while those for EPAS1 were obtained from Beall et al<a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1003110#pgen.1003110-Beall5" target="_blank">[17]</a>.2β indicates the observed linear coefficient for the relationship between SNP genotype and Hb levels.3Power refers to the probability of detecting a significant association (p<0.05) between SNP genotype and Hb level given the MAF and the sample size in the Ethiopian populations assuming that the β coefficient is as high or higher as that observed in Tibetans.</p

FigShare

PRIMAL: Fast and Accurate Pedigree-based Imputation from Sequence Data in a Founder Population

Author: Carole Ober (259474)
Dan L. Nicolae (230903)
Gorka Alkorta-Aranburu (113905)
Lide Han (697344)
Mark Abney (364371)
Oren E. Livne (697343)
William Wentworth-Sheilds (697345)
Publication venue
Publication date: 01/03/2015
Field of study

<div>Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.</div

Directory of Open Access Journals

PubMed Central

FigShare

Parental origin assignment process.

Author: Carole Ober (259474)
Dan L. Nicolae (230903)
Gorka Alkorta-Aranburu (113905)
Lide Han (697344)
Mark Abney (364371)
Oren E. Livne (697343)
William Wentworth-Sheilds (697345)
Publication venue
Publication date
Field of study

For a given quasi-founder, we denote his/her haplotypes by A and B, and (by convention) the first is paternal and the second is maternal. At each SNV, we calculate a 2×2 matrix of kinships (Step 1) between each of the proband’s parents and each subject in the A and B IBD cliques. Using these, we generate a parental haplotype separation measure m (Step 2). If m≈1, A and B are already correctly ordered; if m≈-1, they should be swapped. If the majority of the SNVs agree on the same swapping (indicated by a sample separation M sufficiently close to 1 in Step 3), we assign paternal origin and reorder A and B accordingly (Step 4).</p

FigShare

Imputation performance.

Author: Carole Ober (259474)
Dan L. Nicolae (230903)
Gorka Alkorta-Aranburu (113905)
Lide Han (697344)
Mark Abney (364371)
Oren E. Livne (697343)
William Wentworth-Sheilds (697345)
Publication venue
Publication date
Field of study

PRIMAL and PRIMAL combined with LD-based imputation (PRIMAL+LD) performance and calls rates for the 1,317 Hutterites whose genomes were not sequenced. The concordance and het concordance figures marked by asterisks were based on the concordance of the PRIMAL and LD-based imputation on the set of genotypes called by both. Cross-validation SNVs were chosen from the framework SNVs as explained in the text.Imputation performance.</p

FigShare

Partitioning an IBD-sharing graph into cliques.

Author: Carole Ober (259474)
Dan L. Nicolae (230903)
Gorka Alkorta-Aranburu (113905)
Lide Han (697344)
Mark Abney (364371)
Oren E. Livne (697343)
William Wentworth-Sheilds (697345)
Publication venue
Publication date
Field of study

(1) IBD segments are indexed into a graph at each SNV. Nodes represent haplotypes (denoted A-H). Each pair of haplotypes that share an IBD segment at the SNV is connected with a link whose weight equals the HMM posterior probability. (2) Link weights are replaced by affinities. Links with small original weight or affinity are removed (3); all nodes within each of the resulting connected components are connected (4).</p

FigShare

The imputation pipeline.

Author: Carole Ober (259474)
Dan L. Nicolae (230903)
Gorka Alkorta-Aranburu (113905)
Lide Han (697344)
Mark Abney (364371)
Oren E. Livne (697343)
William Wentworth-Sheilds (697345)
Publication venue
Publication date
Field of study

Given a pedigree tree of 3,671 Hutterites (1), 1,415 individuals in the three most recent generations (within the red box) were genotyped with framework markers (2). The first part of the pipeline (steps 2–6) depends only on the framework marker data; the second part (steps 7–9) imputes the whole genome sequence variants. First, estimates of identity coefficients and the transition rate parameter λ [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004139#pcbi.1004139.ref024" target="_blank">24</a>] between each pair of the 1,415 individuals are calculated (3). The framework genotypes are then phased (4), IBD segments between haplotypes are identified using a HMM (5), and indexed into an efficient data structure consisting of IBD cliques (6). Haplotypes are assigned parental origins consistent across the pedigree using the cliques (7). Then, the whole genome sequences of 98 Hutterites (8) are cleaned using several filters, including a novel generalized Mendelian error check (9), and imputed to the remaining 1,317 Hutterites using IBD cliques (10). Call rates are boosted by imputing as many of remaining genotypes as possible using an LD-based imputation method, IMPUTE2 (11). To ensure that accuracy is not compromised, we calculate the concordance of the shared genotypes between the two methods and keep only variants that are highly concordant (12).</p

FigShare

Variance in mAO for each CAGexp allele (40–44 range) in normotensives and in pre-HD AHT patients.

Author: Ana Aguirre (765363)
Ana M. Zubiaga (429975)
Asier Fullaondo (765356)
Carsten Saft (42954)
Gorka Alkorta-Aranburu (113905)
Hugh Rickards (5264378)
Leire Valcárcel-Ocete (765355)
Lena E. Hjermind (5264375)
Marina Frontali (352662)
María García-Barcina (22356)
Ralf Reilmann (728313)
Raymund A. C. Roos (435493)
Publication venue
Publication date
Field of study

Numbers above the square brackets are the difference in median years between Pre-HD AHT and normotensives for each CAGexp allele.</p

FigShare