23 research outputs found
Recommended from our members
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them.National Science Foundation [1640775]Open access journal.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Complete Genome Sequence of Cyanobacterial Siphovirus KBS2A
Abstract We present the genome of a cyanosiphovirus (KBS2A) that infects a marine Synechococcus sp. (strain WH7803). Unique to this genome, relative to other sequenced cyanosiphoviruses, is the absence of elements associated with integration into the host chromosome, suggesting this virus may not be able to establish a lysogenic relationship. Go to: GENOME ANNOUNCEMENT As obligate parasites, viruses can regulate their host population dynamics but also influence the structure and productivity of microbial communities (1, 2). Synechococcus species are an abundant and ecologically important group of Cyanobacteria found in freshwater and marine ecosystems worldwide. Virus-cyanobacterium interactions may have important implications for global biogeochemical cycles. The most commonly isolated cyanophages are myoviruses and podoviruses (3, 4). Siphoviruses are a third group of viruses that infect cyanobacteria, but they have received less attention (5). The genomes of 5 cyanosiphoviruses have recently become available: that of P-SS2, a siphovirus infecting Prochlorococcus (MIT9313) (6), followed by the cyanosiphoviruses S-CBS1, S-CBS2, S-CBS3, and S-CBS4, isolated from the Chesapeake Bay Estuary, all infecting Synechococcus populations (5). Here, we present the complete genome of cyanosiphophage (KBS2A, originally named KBS-S-2A), a virus that infects Synechococcus sp. strain WH7803. The virus was isolated by plaque assay from the Chesapeake Bay by plating on Synechococcus sp. WH7803. Purified virus DNA was submitted to the Broad Institute as part of the Marine Phage Sequencing Project, where it was sequenced to ~30-fold coverage using 454 pyrosequencing. Translated open reading frames (ORFs) were compared with known protein sequences using the BLASTp program. ORF annotation was aided by the use of PSI-BLAST, HHpred, gene size, and domain conservation. The genome size of KBS2A is 40,658 bp. In total, 64 ORFs have been predicted in this genome; of these, 43 have homologues in databases, and among them, 33 have been assigned to a putative function. For most (88%) predicted ORFs with homologues, homology has been found with the other cyanosiphovirus genomes. We compared the genomic arrangements of the 6 sequenced cyanosiphoviruses using dot plot and global gene homology and found no common genomic organization, suggesting strong mosaicism in the cyanosiphoviruses. In cyanophages, cyanobacterium-related proteins can be found and are often associated with photosynthesis and transcriptional regulation (6). In previously sequenced cyanosiphovirus genomes (5, 6), numerous viral genes (6 to 40 per genome) possess homology with host genes. In the case of the KBS2A genome, only 3 ORFs (coding for RNA polymerase sigma factor RpoD, HNH endonuclease, and a putative DNA polymerase) show such homology, implying less exchange (and potentially interaction) with the host genome. The first annotated cyanosiphovirus genome (that of P-SS2) showed the presence of genes identified as encoding an integrase and excisionase, which are enzymes that allow for phage integration into the host’s genome (6). Moreover, the annotation of cyanosiphoviruses S-CBS1 and S-CBS3 led to the discovery of a prophage-like structure in two sequenced Synechococcus elongatus strains (5). In phage genomes, tRNA genes serve as indicators of potential phage integration by site-specific recombination (7, 8), although recent models have offered alternative suggestions for the role of these genes (9). Sequences of this nature can, however, be found in the P-SS2 and S-CBS4 genomes. No such features (tRNAs, integrases, etc.) were found in the genome of KBS2A, suggesting that this siphovirus might be an exclusively lytic phage rather than a temperate phage. Nucleotide sequence accession number. The complete sequence of the Synechococcus phage KBS2A genome can be accessed under the GenBank accession no. HQ634187.
doi: 10.1128/genomeA.00472-1
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them
Recommended from our members
Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons
Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.National Science Foundation [1640775]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Bacteroides abundance drives birth mode dependent infant gut microbiota developmental trajectories
Background and aimsBirth mode and other early life factors affect a newborn's microbial colonization with potential long-term health effects. Individual variations in early life gut microbiota development, especially their effects on the functional repertoire of microbiota, are still poorly characterized. This study aims to provide new insights into the gut microbiome developmental trajectories during the first year of life. MethodsOur study comprised 78 term infants sampled at 3 weeks, 3 months, 6 months, and 12 months (n = 280 total samples), and their mothers were sampled in late pregnancy (n = 50). Fecal DNA was subjected to shotgun metagenomic sequencing. Infant samples were studied for taxonomic and functional maturation, and maternal microbiota was used as a reference. Hierarchical clustering on taxonomic profiles was used to identify the main microbiota developmental trajectories in the infants, and their associations with perinatal and postnatal factors were assessed. ResultsIn line with previous studies, infant microbiota composition showed increased alpha diversity and decreased beta diversity by age, converging toward an adult-like profile. However, we did not observe an increase in functional alpha diversity, which was stable and comparable with the mother samples throughout all the sampling points. Using a de novo clustering approach, two main infant microbiota clusters driven by Bacteroidaceae and Clostridiaceae emerged at each time point. The clusters were associated with birth mode and their functions differed mainly in terms of biosynthetic and carbohydrate degradation pathways, some of which consistently differed between the clusters for all the time points. The longitudinal analysis indicated three main microbiota developmental trajectories, with the majority of the infants retaining their characteristic cluster until 1 year. As many as 40% of vaginally delivered infants were grouped with infants delivered by C-section due to their clear and persistent depletion in Bacteroides. Intrapartum antibiotics, any perinatal or postnatal factors, maternal microbiota composition, or other maternal factors did not explain the depletion in Bacteroides in the subset of vaginally born infants. ConclusionOur study provides an enhanced understanding of the compositional and functional early life gut microbiota trajectories, opening avenues for investigating elusive causes that influence non-typical microbiota development.Peer reviewe
An extended catalog of integrated prophages in the infant and adult fecal microbiome shows high prevalence of lysogeny
Background and aimsThe acquisition and gradual maturation of gut microbial communities during early childhood is central to an individual’s healthy development. Bacteriophages have the potential to shape the gut bacterial communities. However, the complex ecological interactions between phages and their bacterial host are still poorly characterized. In this study, we investigated the abundance and diversity of integrated prophages in infant and adult gut bacteria by detecting integrated prophages in metagenome assembled genomes (MAGs) of commensal bacteria.MethodsOur study included 88 infants sampled at 3 weeks, 3 months, 6 months, and 12 months (n = 323 total samples), and their parents around delivery time (n = 138 total samples). Fecal DNA was extracted and characterized by using shotgun metagenomic sequencing, and a collection of prokaryotic MAGs was generated. The MAG collection was screened for the presence of integrated bacteriophage sequences, allowing their taxonomic and functional characterization.ResultsA large collection of 6,186 MAGs from infant and adult gut microbiota was obtained and screened for integrated prophages, allowing the identification of 7,165 prophage sequences longer than 10 kb. Strikingly, more than 70% of the near-complete MAGs were identified as lysogens. The prevalence of prophages in MAGs varied across bacterial families, with a lower prevalence observed among Coriobacteriaceae, Eggerthellaceae, Veillonellaceae and Burkholderiaceae, while a very high prevalence of lysogen MAGs were observed in Oscillospiraceae, Enterococcaceae, and Enterobacteriaceae. Interestingly for several bacterial families such as Bifidobacteriaceae and Bacteroidaceae, the prevalence of prophages in MAGs was higher in early infant time point (3 weeks and 3 months) than in later sampling points (6 and 12 months) and in adults. The prophage sequences were clustered into 5,616 species-like vOTUs, 77% of which were novel. Finally, we explored the functional repertoire of the potential auxiliary metabolic genes carried by these prophages, encoding functions involved in carbohydrate metabolism and degradation, amino acid metabolism and carbon metabolism.ConclusionOur study provides an enhanced understanding of the diversity and prevalence of lysogens in infant and adult gut microbiota and suggests a complex interplay between prophages and their bacterial hosts
Cyanolichen microbiome contains novel viruses that encode genes to promote microbial metabolism
Lichen thalli are formed through the symbiotic association of a filamentous fungus and photosynthetic green alga and/or cyanobacterium. Recent studies have revealed lichens also host highly diverse communities of secondary fungal and bacterial symbionts, yet few studies have examined the viral component within these complex symbioses. Here, we describe viral biodiversity and functions in cyanolichens collected from across North America and Europe. As current machine-learning viral-detection tools are not trained on complex eukaryotic metagenomes, we first developed efficient methods to remove eukaryotic reads prior to viral detection and a custom pipeline to validate viral contigs predicted with three machine-learning methods. Our resulting high-quality viral data illustrate that every cyanolichen thallus contains diverse viruses that are distinct from viruses in other terrestrial ecosystems. In addition to cyanobacteria, predicted viral hosts include other lichen-associated bacterial lineages and algae, although a large fraction of viral contigs had no host prediction. Functional annotation of cyanolichen viral sequences predicts numerous viral-encoded auxiliary metabolic genes (AMGs) involved in amino acid, nucleotide, and carbohydrate metabolism, including AMGs for secondary metabolism (antibiotics and antimicrobials) and fatty acid biosynthesis. Overall, the diversity of cyanolichen AMGs suggests that viruses may alter microbial interactions within these complex symbiotic assemblages
An extended catalog of integrated prophages in the infant and adult fecal microbiome shows high prevalence of lysogeny
Background and aimsThe acquisition and gradual maturation of gut microbial communities during early childhood is central to an individual's healthy development. Bacteriophages have the potential to shape the gut bacterial communities. However, the complex ecological interactions between phages and their bacterial host are still poorly characterized. In this study, we investigated the abundance and diversity of integrated prophages in infant and adult gut bacteria by detecting integrated prophages in metagenome assembled genomes (MAGs) of commensal bacteria.MethodsOur study included 88 infants sampled at 3 weeks, 3 months, 6 months, and 12 months (n = 323 total samples), and their parents around delivery time (n = 138 total samples). Fecal DNA was extracted and characterized by using shotgun metagenomic sequencing, and a collection of prokaryotic MAGs was generated. The MAG collection was screened for the presence of integrated bacteriophage sequences, allowing their taxonomic and functional characterization.ResultsA large collection of 6,186 MAGs from infant and adult gut microbiota was obtained and screened for integrated prophages, allowing the identification of 7,165 prophage sequences longer than 10 kb. Strikingly, more than 70% of the near-complete MAGs were identified as lysogens. The prevalence of prophages in MAGs varied across bacterial families, with a lower prevalence observed among Coriobacteriaceae, Eggerthellaceae, Veillonellaceae and Burkholderiaceae, while a very high prevalence of lysogen MAGs were observed in Oscillospiraceae, Enterococcaceae, and Enterobacteriaceae. Interestingly for several bacterial families such as Bifidobacteriaceae and Bacteroidaceae, the prevalence of prophages in MAGs was higher in early infant time point (3 weeks and 3 months) than in later sampling points (6 and 12 months) and in adults. The prophage sequences were clustered into 5,616 species-like vOTUs, 77% of which were novel. Finally, we explored the functional repertoire of the potential auxiliary metabolic genes carried by these prophages, encoding functions involved in carbohydrate metabolism and degradation, amino acid metabolism and carbon metabolism.ConclusionOur study provides an enhanced understanding of the diversity and prevalence of lysogens in infant and adult gut microbiota and suggests a complex interplay between prophages and their bacterial hosts.Peer reviewe