39 research outputs found
Data_Sheet_1_FRAGTE2: An Enhanced Algorithm to Pre-Select Closely Related Genomes for Bacterial Species Demarcation.PDF
We previously reported on FRAGTE (hereafter termed FRAGTE1), a promising algorithm for sieving (pre-selecting genome pairs for whole-genome species demarcation). However, the overall amount of pairs sieved by FRAGTE1 is still large, requiring seriously unaffordable computing cost, especially for large datasets. Here, we present FRAGTE2. Tests on simulated genomes, real genomes, and metagenome-assembled genomes revealed that (i) FRAGTE2 outstandingly reduces ~50–60.10% of the overall amount of pairs sieved by FRAGTE1, dramatically decreasing the computing cost required for whole-genome species demarcation afterward; (ii) FRAGTE2 shows superior sensitivity than FRAGTE1; (iii) FRAGTE2 shows higher specificity than FRAGTE1; and (iv) FRAGTE2 is faster than or comparable with FRAGTE1. Besides, FRAGTE2 is independent of genome completeness, the same as FRAGTE1. We therefore recommend FRAGTE2 tailored for sieving to facilitate species demarcation in prokaryotes.</p
Additional file 7 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
Sequencing coverage for scaffolds uniquely assembled by metaSPAdes
Table_1_FRAGTE2: An Enhanced Algorithm to Pre-Select Closely Related Genomes for Bacterial Species Demarcation.XLS
We previously reported on FRAGTE (hereafter termed FRAGTE1), a promising algorithm for sieving (pre-selecting genome pairs for whole-genome species demarcation). However, the overall amount of pairs sieved by FRAGTE1 is still large, requiring seriously unaffordable computing cost, especially for large datasets. Here, we present FRAGTE2. Tests on simulated genomes, real genomes, and metagenome-assembled genomes revealed that (i) FRAGTE2 outstandingly reduces ~50–60.10% of the overall amount of pairs sieved by FRAGTE1, dramatically decreasing the computing cost required for whole-genome species demarcation afterward; (ii) FRAGTE2 shows superior sensitivity than FRAGTE1; (iii) FRAGTE2 shows higher specificity than FRAGTE1; and (iv) FRAGTE2 is faster than or comparable with FRAGTE1. Besides, FRAGTE2 is independent of genome completeness, the same as FRAGTE1. We therefore recommend FRAGTE2 tailored for sieving to facilitate species demarcation in prokaryotes.</p
Additional file 2 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
Figure S1. A schematic flow chart for untangling MAGs for almost all F1RT components. Steps including genome-wide alignment by NUCmer (step 1), determination of primary MAGs for isolates (step 2) and for uncultivated components (step 3), assignment of contigs with 1 copiesare shown here. Posteriori probability was calculated by using the Naïve Bayesian Classifier (4-nt motif). Figure S5. Reconstruction statistics for isolates. A and B, for FC2 in terms of scaffold number and base pair (bp) respectively; C and D, for FC3 in terms of scaffold number and bp respectively; E and F, for FC5 in terms of scaffold number and bp respectively; G and H, for FC7 in terms of scaffold number and bp respectively. Figure S6. Reconstruction statistics for uncultured components. A and B, for FC1 in terms of scaffold number and base pair (bp) respectively; C and D, for FC4 in terms of scaffold number and bp respectively; E and F, for FC6 interms of scaffold number and bp respectively; G and H, for FC8 in terms of scaffold number and bp respectively; I and J, for FC9 in terms of scaffold number and bp respectively. Figure S7. The average amino-acid identity distance between the closest relative and the reference with the second highest SCG AAI for each component. The solid line indicates the distance. For the closest relatives, please refer to Additional file 2: Table S3. Figure S8. The heatmap showing the amino-acid identities for SCGs of all components. The reference in red, the closest relative for all components or an additional second closest relative for FC1. For the closest relatives, please refer to Additional file 2: Table S3. Figure S9. Statistics of all Firmicutes SCGs. A, percentage of genomes with >1 copies; B, assembly status for all Firmicutes genomes; C, number distribution of SCGs for all Firmicutes genomes. The data were from all 27,565 Firmicutes genomes. Red, SCG with >1 copies in at least one F1RT MAG; dashed line, the median for percentage of genomes with >1 copies at 0.52. Figure S10. The sequencing coverage distribution for isolates. A, for FC2; B, for FC3; C, for FC5; D, for FC7. Sequencing coverage is calculated using a fixed window of 500 bp with 250 bp overlap. Figure S11. The sequencing coverage distribution for uncultivated components. A, for FC1; B, for FC4; C, for FC6; D, for FC8; E for FC9. Sequencing coverage is calculated using a fixed window of 500 bp with 250 bp overlap. Figure 12. Classification statistics of 10-kb fragmentsfor all F1RT MAGs or their reference genomes. A, for F1RT MAGs; B for F1RT reference genomes. The number in a cell is the fraction (%) of fragments classified into their corresponding organism and used as a basis for color intensity. The 10-kb fragments are produced via dividing (pre-concatenated) MAGs or reference genomes. For references, please refer to Additional file 2: Table S4. Figure 13. Phylogenetic relationships for F1RT MAGs and their reference genomes. A, for F1RT MAGs; B, for reference genomes. Phylogenetic relationships were determined on the basis of the average amino acid identity for SCGs. For references, please refer to Additional file 2: Table S4. Figure 14. Mapping qualities for scaffolds with abnormally high or normal sequencing coverage. A, for FC1; B, for FC2; C, for FC3; D, for FC4; E, for FC5; F, for FC6; G, for FC7; H, for FC8; I, for FC9. MAPQ, mapping quality. Figure 15. Statistics of duplicated reads mapped to scaffolds with abnormally high or normal sequencing coverage. A, for FC1; B, for FC2; C, for FC3; D, for FC4; E, for FC5; F, for FC6; G, for FC7; H, for FC8; I, for FC9. Duplication is calculated as the total alignments divided by the alignments after removing duplication by using “samtools markdup -r”. Figure 16. Size statistics of scaffolds with abnormally high or normal sequencing coverage. A, for FC1; B, for FC2; C, for FC3; D, for FC4; E, for FC5; F, for FC6; G, for FC7; H, for FC8; I, for FC9. Table S1. Mapping summary and the read mapping ratios for the references of the two scaffolds. Table S2. The genomic statistics of F1RT isolates. Table S3. The close relatives of all F1RT components. Table S4. Reference genomes of all F1RT components for TETRA analysis. Table S5. Alignments between scaffolds with abnormally high sequencing coverage and plasmid sequences
Additional file 1 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
Coordinates for nucleotide-based alignments between scaffolds assembled by unmapped reads and F1RT metagenome scaffolds (F1RT) reassembled by metaSPAdes
Additional file 5 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
Coordinates for nucleotide-based alignments between scaffolds from the FC5 isolate genome and scaffolds from the F1RT metagenome assembled by metaSPAdes
Additional file 9 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
SCG scaffolds, their origins and sequencing coverage. Number in brackets, sequencing coverage across a whole scaffold
Additional file 8 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
SCG scaffolds and their sequencing coverage for FC8-9
Additional file 4 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
Coordinates for nucleotide-based alignments between scaffolds from the FC3 isolate genome and scaffolds from the F1RT metagenome assembled by metaSPAdes
Additional file 3 of Constructing metagenome-assembled genomes for almost all components in a real bacterial consortium for binning benchmarking
Coordinates for nucleotide-based alignments between scaffolds from the FC2 isolate genome and scaffolds from the F1RT metagenome assembled by metaSPAdes
