28 research outputs found

    P-values from association tests of jointly analyzing CYL and SPH.

    No full text
    <p>The bold-face texts highlight where ATeMP tests may be superior to MultiPhen.</p

    Resolution for varying relatedness using GRM, encGRM and <i>encG-reg</i>.

    No full text
    The figure shows the resolution for detecting relatives or overlapping samples with respect to varying number of markers at every row (for better illustration me was twice that of Eq 3) and the degree of relatives to be detected (r = 0, 1, and 2). The y axis is the relatedness calculated from GRM and the x axis is the estimated relatedness calculated from encG-reg (A) and encGRM (B). Each point represents an individual pair between cohort 1 and cohort 2 (there are 200 × 200 = 40,000 pairs in total), given the simulated relatedness. The dotted line indicates the 95% confidence interval of the relatedness directly estimated from the original genotype (blue) and the encrypted genotype (red). The table provides how m and k are estimated. The columns “under minimal me” provide benchmark for a parameter, and it is practically to choose 2×me and then estimate k as shown under the column “practical me”.</p

    Workflow of <i>encG-reg</i> and its practical timeline as exercised in Chinese cohorts.

    No full text
    The mathematical details of encG-reg are simply algebraic, but its inter-cohort implementation involves coordination. (A) We illustrate its key steps, the time cost of which was adapted from the present exercise for 9 Chinese datasets (here simplified as three cohorts). Cohort assembly: It took us about a week to call and got positive responses from our collaborators (See Table 3), who agreed with our research plan. Inter-cohort QC: we received allele frequencies reports from each cohort and started to implement inter-cohort QC according to “geo-geno” analysis (see Fig 6). This step took about two weeks. Encrypt genotypes: upon the choice of the exercise, it could be exhaustive design (see UKB example), which may maximize the statistical power but with increased logistics such as generating pairwise Sij; in the Chinese cohorts study we used parsimony design, and generated a unique S given 500 SNPs that were chosen from the 7,009 common SNPs. It took about a week to determine the number of SNPs and the dimension of k according to Eq 3 and 4, and to evaluate the effective number of markers. Perform encG-reg and validation: we conducted inter-cohort encG-reg and validated the results (see Fig 7 and Table 4). It took one week. (B) Two interactions between data owners and central analyst, including example data for exchange and possible attacks and corresponding preventative strategies.</p

    Cohort-level genetic background analyses for Chinese cohorts under parsimony encG-reg analysis.

    No full text
    (A) Overview of the intersected SNPs across cohorts, a black dot indicated its corresponding cohort was included. Each row represented one cohort while each column represented one combination of cohorts. Dots linked by lines suggested cohorts in this combination. The height of bars represented the cohort’s SNP numbers (rows) or SNP intersection numbers (columns). Inset histogram plots show the distribution of the 7,009 intersected SNPs and the 500 SNPs randomly chosen from the 7,009 SNPs for encG-reg analysis. (B) 7,009 SNPs were used to estimate fPC from the intersection of SNPs for the 9 cohorts. Each triangle represented one Chinese cohort and was placed according to their first two principal component scores (fPC1 and fPC2) derived from the received allele frequencies. (C) Five private datasets have been pinned onto the base map from GADM (https://gadm.org/data.html) using R language. The size of point indicates the sample size of each dataset. (D) Global fStructure plot indicates global-level Fst-derived genetic composite projected onto the three external reference populations: 1KG-CHN (CHB and CHS), 1KG-EUR (CEU and TSI), and 1KG-AFR (YRI), respectively; 4,296 of the 7,009 SNPs intersected with the three reference populations were used. (E) Within Chinese fStructure plot indicates within-China genetic composite. The three external references are 1KG-CHB (North Chinese), 1KG-CHS (South Chinese), and 1KG-CDX (Southwest minority Chinese Dai), respectively; 4,809 of the 7,009 SNPs intersected with these three reference populations were used. Along x axis are 9 Chinese cohorts and the height of each bar represents its proportional genetic composition of the three reference populations. Cohort codes: YRI, Yoruba in Ibadan representing African samples; CHB, Han Chinese in Beijing; CHS, Southern Han Chinese; CHN, CHB and CHS together; CEU, Utah Residents with Northern and Western European Ancestry; TSI, Tuscani in Italy; CDX, Chinese Dai in Xishuangbanna.</p