25 research outputs found

    Time and memory consumption.

    No full text
    Time and memory consumption.</p

    Scatter plot of the mean allele frequencies per genome (n = 50) vs. age of the genomes (calibrated years before present).

    No full text
    Red line represents the linear model. We found that there is no significant correlation between the age of the individuals and the mean allele frequency (Spearman’s rank correlation ρ = -0.12, P = 0.41). (TIF)</p

    Multivariate analysis of deletion frequencies reveal outlier genomes.

    No full text
    Left panels: Multidimensional scaling plots (MDS) calculated with k = 2 using the R “cmdscale” function on a Euclidean distance matrix of deletion frequencies. Middle panels: Principal component analysis plots (PCA) summarizing deletion frequencies after removing any NAs. Right panels: Hierarchical clustering trees summarizing Manhattan distance matrices, calculated using the R “dist” and “hclust” functions. The color codes indicate the laboratory-of-origin of each genome, shown in the legend of the top right panel. (A) Results based on the full dataset with 10,002 human-derived deletions (n = 8,780 genotyped in any state in at least one genome) and n = 71 genomes. In the PCA we use nD = 580 deletions after removing loci with at least one missing value. (B) Results based on n = 60 genomes after removing 11 outlier genomes (and nD = 3,460 deletions in the PCA). (C) Results based on n = 50 genomes after removing 21 outlier genomes (and nD = 3,472 deletions in the PCA). We note that the MDS here differs from that shown in Fig 5, in that the latter is calculated using outgroup-f3 statistics. (TIF)</p

    Overall workflow of CONGA.

    No full text
    The first step involves initialization, where we create the input (reference) CNV file using deletions and duplications identified in high quality genome sets. We apply CONGA-genotyping in the second step and create the initial CNV call set. We then perform filtering and refining steps, and thus generate the final CNV call set.</p

    TPR vs FDR curves for deletion and duplication predictions of CONGA.

    No full text
    Here, we use Mota, Saqqaq and Yamnaya genomes down-sampled to various depths from their original coverages of 9.6×, 13.1× and 23.3×, respectively. The numbers inside boxes show the down-sampled coverage values. We calculated TPR and FDR for down-sampled genomes assuming that our CONGA-based predictions with the original genomes (full data) reflect the ground truth. These predictions, in turn, were made using modern-day CNVs as candidate CNV list. The purpose of the experiment was to evaluate accuracy at lower coverage relative to the full data, as well as to compare performance across different real genomes (Methods).</p

    Site-frequency spectra of deletions genotyped in low and high coverage genomes.

    No full text
    Left panel represents the SFS of n = 25 below-median coverage genomes and right panel shows the SFS of n = 25 above-median coverage genomes. The median coverage value was 3.98×. We found no significant difference between the two SFS distributions (Kolmogorov-Smirnov test ρ = 0.27). (TIF)</p

    IGV visualization of two high scoring (i.e., high likelihood) deletions and duplications predicted by CONGA.

    No full text
    The events displayed in the upper panels were detected in a modern-day human genome (NA07051: an ∼8 kbp deletion within chr7:16,169,440-16,177,556 and a ∼4 kbp duplication within chr7:22,496-26,553) and those in the lower panels in an ancient genome (RISE98: an ∼17 kbp deletion within chr6:32,506,809-32,524,264 and a ∼6 kbp duplication within chr1:1,520,604-1,526,959). The candidate CNV list used for genotyping was the long read CNV dataset described in Methods. Deducing the CNVs is straightforward with the modern-day genome data, however, it is less straightforward to distinguish these variations in ancient read data, especially for duplications. Note that this is one of the sample scenarios and we emphasize that a large number of CNVs identified in ancient genomes suffer from the same issue. (TIF)</p

    Deleterious load estimates among 50 ancient genomes.

    No full text
    In all three panels, the x-axis represents a deleterious load-related statistic and the y-axis shows the ancient individuals. (A) Deleterious load based on SIFT-estimated SNP effects per individual. The x-axis represents the number of “deleterious” SNPs over the number of “tolerated” SNPs. (B) CONGA-estimated total deletion length in kb per individual, using the Final CNV call-set. (C) The number of genes that overlap with CONGA-estimated deletions. In panels B and C, heterozygous and homozygous calls were counted once. In panel C, the most affected individuals in terms of the number of gene overlaps are RISE497 (Russia, 2nd millennium BCE), DA380 (Turkmenistan, 4th millennium BCE), RISE675 (Russia, 3rd millennium BCE), and Chan (Iberia, 8th millennium BCE). We observed that these individuals were around 50% more affected than the rest. (TIF)</p

    Heatmaps of CONGA-genotyped n = 10,002 human-derived deletions across ancient genomes.

    No full text
    The color key includes 0 (gray) for reference allele, 1 (green) for heterozygous, 2 (magenta) for homozygous state and NA (white) for missing value. (A) Heatmap of deletions per genome on the raw dataset (n = 71 genomes). (B) Heatmap of deletions per genome on the refined dataset (with n = 50 genomes after removing divergent genomes). (TIF)</p

    Precision—Recall plots for simulations.

    No full text
    Precision-Recall curves for deletion (A) and duplication (B) predictions of CONGA, GenomeSTRiP, FREEC, and CNVnator using coverages of 0.05×, 0.1×, 0.5×, 1× and 5×. mrCaNaVaR was used only in the analysis of large variants. (TIF)</p
    corecore