38 research outputs found
Additional file 3: of Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer
Testing genetic drift, selection, and recombination. Table S7. FST values for genetic differentiation between coastal and offshore populations within each cyanophage lineage. The FST value for lineage II is most likely high due to the low number of representatives and lack of diversity among the offshore lineage II phages. Due to the low number of individuals isolated from the offshore site for some of the clusters, FST could not be calculated. Figure S4. (A) Quantitative host range analyses of 15 Synechococcus host strains against 138 cyanophage isolates testing the efficacy of infection. (B) Analysis of mean infectivity of coastal and upwelling phages in lineage I, II, IV, and VI reveal statistically different infectivity phenotypes at either site with T-test p <0.05 (*). Statistical significance was not assessed for lineages III and V due to low sample size nor on the original isolation host, WH7803 (†). Table S8. Corrected Rand Indices and Malia’s VI values to hierarchical clustering between the original host range matrix and a randomized host range matrix. The hierarchical clusters were split into different number of clusters (5,10, 20 and 50) for the analyses. The analyses revealed low correspondence between clustering, indicating that the clustering we observe in the original shared genes matrix is not random and that there is some correlation with a biological signal. Table S9. Fisher’s exact tests p-values and phi coefficients (for effect size) for genes found under positive selection in comparisons between phylogenomic lineages using the non-polarized McDonald-Kreitman test. Also, the table reflects the corresponding protein clusters in the GOV population dataset [19]. Table S10. Genic versus intergenic recombination breakpoints for each lineage. (DOC 250 kb
The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation
<div><h3>Background</h3><p>The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.</p> <h3>Methodology/Principal Findings</h3><p>In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.</p> <h3>Conclusion</h3><p>These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).</p> </div
Additional file 2: of Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer
Lineage information. Figure S2. TEM images of phage isolates from cyanophage (A) lineage I, (B) lineage II, (C) lineage III, (D) lineage IV, (E) lineage V, (F) lineage VI, and (G-J) singleton and duplicon populations confirms myovirus morphology. Table S2. List of 51 core protein clusters shared across all six phylogenetic lineages. Fig. S3. Unrooted phylogenomic maximum likelihood tree of 27 concatenated protein sequences shared across published marine and non-marine T4-like phage genomes and the 142 cyanophage isolate genomes sequenced here. For simplicity, cyanophage isolate names in this tree were shortened from Syn7803* to just *. These analyses show that the 10 cyanophage populations observed here share similar evolutionary histories with other T4-like phages. Table S3. Average ANI of the 51 core genes within and between lineages. Table S4. AGDB groupings correspond with the phylogenetic lineages. Table S5. Average phylogenetic distances within and between lineages. Table S6. Corrected Rand Indices and Malia’s VI values to compare the row and column hierarchical clustering between the original ANI and Shared Gene matrix and a randomized ANI and Shared Gene matrix, respectively. The hierarchical clusters were split into different number of clusters (5,10, 20 and 50) for the analyses. The analyses revealed low correspondence between clustering, indicating that the clustering we observe in the original matrices are not random. (DOC 4227 kb
Additional file 1: Table S1. of High quality permanent draft genome sequence of Phaseolibacter flectens ATCC 12775T, a plant pathogen of French bean pods
Scaffolds and contigs of Genomic DNA for Phaseolibacter flectens ATCC 12775T (Topology; linear, Read depth; 1.00). (DOCX 24 kb
Additional file 1: of Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer
Metadata. Table S1. Metadata from the coastal (H3) and mesotrophic (67-70) sites. Samples for the 16S rRNA amplicons were collected twice at 67-70, 8 days apart (with the 10 Oct. sample being done on the same day as the viral sample). The mesotrophic station is often subject to upwelling. Figure S1. Sea surface temperatures (SST) of the region of the California Cooperative Oceanic Fisheries Investigations (CalCOFI) Line 67 ocean transect on 5 October 2009 (contoured from a single synoptic image, Aqua Modis, NOAA) with the locations of the nearshore (yellow, H3 - coastal) and offshore (red, 67-70 - offshore mesotrophic) stations marked with stars. Gene marker (16S rRNA gene amplicons) analyses using the reference alignments of ref. Sudek et al., 2015 revealed different Synechococcus communities at the two sites (for additional details on sampling details see Additional file 2: Table S1). The Synechococcus community was analyzed twice at 67-70, one day after the coastal sampling and on the same day as the viral 67-70 sample collection. Proportions of different clades varied in the 67-70 Synechococcus amplicon data but the same clades were present on both dates. (DOC 721 kb
The distribution of projects among the 12 sequencing methods used.
<p>With dark green color are indicated the projects for which there are more than 5 sequenced projects and were used in downstream analysis.</p
Methods used in this comparison.
1<p>PE: paired end reads.</p>2<p>LMP: Long Mate Paired reads.</p
Correlation of the number of contigs with genome GC%, repeat content, and size.
<p>Data shown are the Kendall rank correlation coefficients.</p>*<p> = pvalue<0.05.</p
Misassemblies as detected by low gene quality.
<p>Low quality genes are genes present in the finished genome that had a similarity (tBLASTn) to the draft genome but the alignment was either short (<50% of the gene length) or identity was <90%. Data is shown for the six sequencing methods with more than 5 projects.</p
Genes missed in draft assemblies.
<p>Data is shown for the sequencing methods with more than 5 projects. (a) Missed gene sequences, i.e., the number of genes in the finished genome whose nucleotide sequence is absent from the draft assembly. (b) Unrecognized genes, i.e., the number of genes whose nucleotide sequence is present in the draft assembly but that were not predicted by Prodigal (v2.5).</p