Search CORE

24 research outputs found

Domain-centric database to uncover structure of minimally characterized viral genomes

Author: Bramley John C.
Buchser William J.
DiAntonio Aaron
Milbrandt Jeffrey D.
Yenkin Alex L.
Zaydman Mark A.
Publication venue: Digital Commons@Becker
Publication date: 01/01/2020
Field of study

Efam: an expanded, metaproteome-supported HMM profile database of viral protein families

Author: Adkins Joshua N
Bolduc Ben
Cronin Dylan
Gregory Ann C
Hargreaves Katherine R
Huang Eric L
Lücking Dominik
Mohssen Mohamed
Moraru Cristina
Piehowski Paul D
Roux Simon
Sullivan Matthew B
White Richard A
Zayed Ahmed A
Publication venue: Oxford University Press (OUP)
Publication date: 15/11/2021
Field of study

Motivation: Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. Results: Here, we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240 311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from 'conservative' to 'eXtremely Conservative' resulted in 37 841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by ∼24% on average (up to ∼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem

E-space: Manchester Metropolitan University's Research Repository

efam: an expanded, metaproteome-supported HMM profile database of viral protein families

Author: Adkins J.
Bolduc B.
Cronin D.
Dominik L.
Gregory A.
Hargreaves K.
Huang E.
Luecking D.
Mohssen M.
Moraru C.
Piehowski P.
Roux S.
Sullivan M.
White R.
Zayed A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2021
Field of study

Motivation: Viruses infect, reprogram and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. Results: Here, we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240 311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, panecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from 'conservative' to 'eXtremely Conservative' resulted in 37 841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by similar to 24% on average (up to similar to 42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem

MPG.PuRe

MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins

Author: Aline M. da Silva
Deyvid Amgarten
João C. Setubal
João C. Setubal
Lucas P. P. Braga
Lucas P. P. Braga
Publication venue: 'Frontiers Media SA'
Publication date: 01/08/2018
Field of study

Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL

Directory of Open Access Journals

SPARSE FORWARD-BACKWARD ALIGNMENT FOR SENSITIVE DATABASE SEARCH WITH SMALL MEMORY AND TIME REQUIREMENTS

Author: Rich David H.
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/2021
Field of study

Sequence annotation is typically performed by aligning an unlabeled sequence to a collection of known sequences, with the aim of identifying non-random similarities. Given the broad diversity of new sequences and the considerable scale of modern sequence databases, there is significant tension between the competing needs for sensitivity and speed, with multiple tools displacing the venerable BLAST software suite on one axis or another. In recent years, alignment based on profile hidden Markov models (pHMMs) and associated probabilistic inference methods have demonstrated increased sensitivity due in part to consideration of the ensemble of all possible alignments between a query and target using the Forward/Backward algorithm, rather than simply relying on the single highest-probability (Viterbi) alignment. Modern implementations of pHMM search achieve their speed by avoiding computation of the expensive Forward/Backward algorithm for most (HMMER3) or all (MMseqs2) candidate sequence alignments. Here, we describe a heuristic Forward/Backward algorithm that avoids filling in the entire quadratic dynamic programming (DP) matrix, by identifying a sparse cloud of DP cells containing most of the probability mass. The method produces an accurate approximation of the Forward/Backward alignment with high speed and small memory requirements. We demonstrate the utility of this sparse Forward/Backward approach in a tool that we call MMOREseqs; the name is a reference to the fact that our tool utilizes the MMseqs2 software suite to rapidly identify promising seed alignments to serve as a basis for sparse Forward/Backward. MMOREseqs demonstrates improved annotation sensitivity with modest increase in run time over MMseqs2 and is released under the open BSD-3-clause license. Source code and Docker image are available for download at https://github.com/TravisWheelerLab/MMOREseqs

University of Montana

Recommended from our members

Clades of huge phages from across Earth's ecosystems.

Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems

eScholarship - University of California

Online Research Database In Technology

Clades of huge phages from across Earth's ecosystems

Author: Al-Shayeb B
Amano Y
Amundson R
Anantharaman K
Banfield JF
Borton MA
Bouma-Gregson K
Brooks B
Castelle CJ
Cate JHD
Chen L-X
Devoto A
Doudna JA
Farag IF
Finstad K
Goltsman DSA
Harrison S
He C
Jaffe AL
Kantor R
Keren R
Lane KR
Lavy A
Lehours A-C
Lei S
Li W-J
Matheus-Carnevali P
Morowitz M
Munk P
Méheust R
Nelson TC
Olm MR
Power ME
Probst AJ
Relman DA
Sachdeva R
Santini JM
Sharrar A
Sun C
Thomas A
Tringe SG
Ward F
Warren L
Wrighton K
Zhou J
Publication venue
Publication date: 20/02/2020
Field of study

Bacteriophages typically have small genomes and depend on their bacterial hosts for replication. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems

UCL Discovery

Functional Phage Genomics of selected Taxa

Author: Chibani Cynthia Maria
Publication venue
Publication date: 21/05/2019
Field of study

Georg-August-University Göttingen

Application of Machine Learning in Microbiology

Author: Fei Guo
Kaiyang Qu
Quan Zou
Quan Zou
Xiangrong Liu
Yuan Lin
Yuan Lin
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2019
Field of study

Microorganisms are ubiquitous and closely related to people’s daily lives. Since they were first discovered in the 19th century, researchers have shown great interest in microorganisms. People studied microorganisms through cultivation, but this method is expensive and time consuming. However, the cultivation method cannot keep a pace with the development of high-throughput sequencing technology. To deal with this problem, machine learning (ML) methods have been widely applied to the field of microbiology. Literature reviews have shown that ML can be used in many aspects of microbiology research, especially classification problems, and for exploring the interaction between microorganisms and the surrounding environment. In this study, we summarize the application of ML in microbiology

Directory of Open Access Journals