90 research outputs found
Recommended from our members
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them.National Science Foundation [1640775]Open access journal.This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
The Promises and Pitfalls of Machine Learning for Detecting Viruses in Aquatic Metagenomes
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them
Metabolic reprogramming by viruses in the sunlit and dark ocean
BACKGROUND:Marine ecosystem function is largely determined by matter and energy transformations mediated by microbial community interaction networks. Viral infection modulates network properties through mortality, gene transfer and metabolic reprogramming.RESULTS:Here we explore the nature and extent of viral metabolic reprogramming throughout the Pacific Ocean depth continuum. We describe 35 marine viral gene families with potential to reprogram metabolic flux through central metabolic pathways recovered from Pacific Ocean waters. Four of these families have been previously reported but 31 are novel. These known and new carbon pathway auxiliary metabolic genes were recovered from a total of 22 viral metagenomes in which viral auxiliary metabolic genes were differentiated from low-level cellular DNA inputs based on small subunit ribosomal RNA gene content, taxonomy, fragment recruitment and genomic context information. Auxiliary metabolic gene distribution patterns reveal that marine viruses target overlapping, but relatively distinct pathways in sunlit and dark ocean waters to redirect host carbon flux towards energy production and viral genome replication under low nutrient, niche-differentiated conditions throughout the depth continuum.CONCLUSIONS:Given half of ocean microbes are infected by viruses at any given time, these findings of broad viral metabolic reprogramming suggest the need for renewed consideration of viruses in global ocean carbon models.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]
Recommended from our members
Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons
Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.National Science Foundation [1640775]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Roadmap Towards Communitywide Intercalibration and Standardization of Ocean Nucleic Acids ‘Omics Measurements
In January 2020, the US Ocean Carbon & Biogeochemistry (OCB) Project Office funded the Ocean Nucleic Acids 'omics Intercalibration and Standardization workshop held at the University of North Carolina in Chapel Hill. Thirty-two participants from across the US, along with guests from Canada and France, met to develop a framework for standardization and intercalibration (S&I) of ocean nucleic acid ‘omics (na’omics) approaches (i.e., amplicon sequencing, metagenomics and metatranscriptomics). During the three-day workshop, participants discussed numerous topics, including: a) sample biomass collection and nucleic acid preservation for downstream analysis, b) extraction protocols for nucleic acids, c) addition of standard reference material to nucleic acid isolation protocols, d) isolation methods unique to RNA, e) sequence library construction, and f ) integration of bioinformatic considerations. This report provides a summary of these and other topics covered during the workshop and a series of recommendations for future S&I activities for na’omics approaches.The Ocean Nucleic Acids ‘Omics Intercalibration and Standardization Workshop was supported by grants from the Ocean Carbon & Biogeochemistry Program (OCB) – funding provided by the National Science Foundation (NSF) and the National Aeronautics and Space Administration (NASA) – and the Simons Foundation. This report was developed with federal support of NSF (OCE-1558412) and NASA (NNX17AB17G)
Construction, alignment and analysis of twelve framework physical maps that represent the ten genome types of the genus Oryza
Bacterial artificial chromosome (BAC) fingerprint and end-sequenced physical maps representing the ten genome types of Oryza are presente
Integration of hybridization-based markers (overgos) into physical maps for comparative and evolutionary explorations in the genus Oryza and in Sorghum
BACKGROUND: With the completion of the genome sequence for rice (Oryza sativa L.), the focus of rice genomics research has shifted to the comparison of the rice genome with genomes of other species for gene cloning, breeding, and evolutionary studies. The genus Oryza includes 23 species that shared a common ancestor 8–10 million years ago making this an ideal model for investigations into the processes underlying domestication, as many of the Oryza species are still undergoing domestication. This study integrates high-throughput, hybridization-based markers with BAC end sequence and fingerprint data to construct physical maps of rice chromosome 1 orthologues in two wild Oryza species. Similar studies were undertaken in Sorghum bicolor, a species which diverged from cultivated rice 40–50 million years ago. RESULTS: Overgo markers, in conjunction with fingerprint and BAC end sequence data, were used to build sequence-ready BAC contigs for two wild Oryza species. The markers drove contig merges to construct physical maps syntenic to rice chromosome 1 in the wild species and provided evidence for at least one rearrangement on chromosome 1 of the O. sativa versus Oryza officinalis comparative map. When rice overgos were aligned to available S. bicolor sequence, 29% of the overgos aligned with three or fewer mismatches; of these, 41% gave positive hybridization signals. Overgo hybridization patterns supported colinearity of loci in regions of sorghum chromosome 3 and rice chromosome 1 and suggested that a possible genomic inversion occurred in this syntenic region in one of the two genomes after the divergence of S. bicolor and O. sativa. CONCLUSION: The results of this study emphasize the importance of identifying conserved sequences in the reference sequence when designing overgo probes in order for those probes to hybridize successfully in distantly related species. As interspecific markers, overgos can be used successfully to construct physical maps in species which diverged less than 8 million years ago, and can be used in a more limited fashion to examine colinearity among species which diverged as much as 40 million years ago. Additionally, overgos are able to provide evidence of genomic rearrangements in comparative physical mapping studies
Recommended from our members
Gramene: a growing plant comparative genomics resource
Gramene (www.gramene.org) is a curated resource
for genetic, genomic and comparative genomics
data for the major crop species, including rice,
maize, wheat and many other plant (mainly grass)
species. Gramene is an open-source project.
All data and software are freely downloadable
through the ftp site (ftp.gramene.org/pub/gramene)
and available for use without restriction. Gramene’s
core data types include genome assembly and
annotations, other DNA/mRNA sequences, genetic
and physical maps/markers, genes, quantitative
trait loci (QTLs), proteins, ontologies, literature
and comparative mappings. Since our last NAR
publication 2 years ago, we have updated these data
types to include new datasets and new connections
among them. Completely new features include
rice pathways for functional annotation of rice
genes; genetic diversity data from rice, maize and
wheat to show genetic variations among different
germplasms; large-scale genome comparisons
among Oryza sativa and its wild relatives for
evolutionary studies; and the creation of orthologous
gene sets and phylogenetic trees among
rice, Arabidopsis thaliana, maize, poplar and several
animal species (for reference purpose). We have
significantly improved the web interface in order
to provide a more user-friendly browsing
experience, including a dropdown navigation
menu system, unified web page for markers,
genes, QTLs and proteins, and enhanced quick
search functions.This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by Oxford University Press. The published article can be found at: http://nar.oxfordjournals.org/
- …