19 research outputs found

    MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins

    Get PDF
    Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL

    Longer-term effectiveness of a heterologous coronavirus disease 2019 (COVID-19) vaccine booster in healthcare workers in Brazil

    Get PDF
    Abstract Objective: To compare the long-term vaccine effectiveness between those receiving viral vector [Oxford-AstraZeneca (ChAdOx1)] or inactivated viral (CoronaVac) primary series (2 doses) and those who received an mRNA booster (Pfizer/BioNTech) (the third dose) among healthcare workers (HCWs). Methods: We conducted a retrospective cohort study among HCWs (aged ≥18 years) in Brazil from January 2021 to July 2022. To assess the variation in the effectiveness of booster dose over time, we estimated the effectiveness rate by taking the log risk ratio as a function of time. Results: Of 14,532 HCWs, coronavirus disease 2019 (COVID-19) was confirmed in 56.3% of HCWs receiving 2 doses of CoronaVac vaccine versus 23.2% of HCWs receiving 2 doses of CoronaVac vaccine with mRNA booster (P < .001), and 37.1% of HCWs receiving 2 doses of ChAdOx1 vaccine versus 22.7% among HCWs receiving 2 doses of ChAdOx1 vaccine with mRNA booster (P < .001). The highest vaccine effectiveness with mRNA booster was observed 30 days after vaccination: 91% for the CoronaVac vaccine group and 97% for the ChAdOx1 vaccine group. Vacine effectiveness declined to 55% and 67%, respectively, at 180 days. Of 430 samples screened for mutations, 49.5% were SARS-CoV-2 delta variants and 34.2% were SARS-CoV-2 omicron variants. Conclusions: Heterologous COVID-19 vaccines were effective for up to 180 days in preventing COVID-19 in the SARS-CoV-2 delta and omicron variant eras, which suggests the need for a second booster

    Computational analysis of the viral diversity in the Sao Paulo Zoo composting microbial community

    No full text
    O estudo da diversidade viral em amostras ambientais tem se tornado cada vez mais importante devido a funções-chave desempenhadas por esses organismos. Estudos recentes têm fornecido evidências de que vírus de bactérias (bacteriófagos) podem ser os principais determinantes em ciclos biogeoquímicos de grandes ecossistemas, além de atuarem no fluxo de genes entre comunidades ambientais e na plasticidade funcional das mesmas frente a estresses ambientais. Neste trabalho, propomos a investigação e caracterização da diversidade viral presente em amostras de compostagem através de abordagens não dependentes e dependentes de cultivo. Na primeira abordagem, coletamos amostras seriadas de uma unidade de compostagem do zoológico de São Paulo para realização de sequenciamento metagenômico. O conjunto de sequências gerado foi extensivamente minerado (data-mining) para a produção de resultados de diversidade e abundância de táxons virais ao longo do processo de compostagem. Adicionalmente, procedemos com a montagem e recuperação de sequências virais candidatas a genomas completos e/ou parciais de novos vírus ambientais. Os dois protocolos computacionais utilizados para a mineração de dados encontram-se definidos e automatizados, podendo ser aplicados em quaisquer conjuntos de dados de sequenciamento metagenômico ou metatranscritômico obtidos através da plataforma Illumina. A segunda abordagem correspondeu ao isolamento e caracterização de novos fagos de Pseudomonas obtidos de amostras de compostagem. Três novos fagos foram identificados e tiveram os seus genomas sequenciados. A caracterização genômica desses fagos revelou genomas com alto grau de novidade, insights sobre a evolução de Caudovirales e a presença de genes de tRNA, cuja função pode estar relacionada com um mecanismo dos fagos para contornar o viés traducional apresentado pela bactéria hospedeira. A caracterização experimental dos novos fagos isolados demonstrou grande potencial para lise e dissolução de biofilme da cepa Pseudomonas aeruginosa PA14, conhecida como agente causador de infecções hospitalares em pacientes imunodeprimidos. Em suma, os dados reunidos nesta dissertação caracterizam a diversidade presente no viroma da compostagem e contribuem para o entendimento dos perfis taxonômico, funcional e ecológico do processo.The study of the viral diversity in environmental samples has become increasingly important due to key-roles that are performed by these organisms in our ecosystems. Recent publications provide evidence that viruses of bacteria (bacteriophages) may be key-players in biogeochemical cycles of large ecosystems, as oceans and forests. Besides, they may also be determinant in the genes flux among populations and in the plasticity of the communities face to environmental stresses. In this work, we propose the investigation and characterization of the viral diversity in composting samples through non-culturable and culturable-dependent approaches. In the first approach, we sampled a composting unit from the Sao Paulo Zoo Park in different time points and proceeded with metagenomic sequencing. The dataset generated was extensively mined to provide results of diversity and abundance of viral taxa through the composting process. Additionally, we proceeded with the assembly and retrieval of candidate sequences to partial or/and complete viral genomes. The two computational protocols were automatized as pipelines and can be applied to any metagenomic dataset of illumina reads. The second approach refers to the isolation and characterization of new Pseudomonas phages obtained from composting samples. Three new phages were identified and their genomes were sequenced. A detailed characterization of these genomes revealed high degree of novelty, insights about evolution of tailed-phages and the presence of tRNA genes, which may be related to a mechanism to bypass host translational bias. The experimental characterization of the new phages demonstrated great potential to lyse bacterial cells and to degrade Pseudomonas aeruginosa PA14 biofilms. In short, the data presented in this dissertation shed light to the composting virome diversity, as well as to the functional and ecological profiles of viruses in the composting environment

    Predição em sequências de vírus de procariotos através da aplicação de técnicas de aprendizado de máquina em dados metagenômicos

    No full text
    Environmental viruses are extremely diverse and abundant in the biosphere. Several studies have shown prokaryotic viruses (or simply phages) as major players in determining biogeochemical cycles in oceans as well as driving microbial diversification. Besides this ecological role, phages may also be used for clinical purposes since they can kill bacterial cells and terminate infections. A crucial step in this process is the isolation of new phages, which can target a specific bacterial pathogen. Thus, researchers employ screening techniques to find and isolate pathogen-specific phages from environmental samples, which are a rich source of new phages. However, this task remains mostly exploratory and laborious if the researcher has no detailed information about the sample and its potential viral diversity. Having this problem in mind, we propose the development of a bioinformatic workflow to identify genomic sequences belonging to phages in environmental datasets, as well as for host prediction of the identified phages based on their genomic sequences. To achieve this goal, we implemented a random forest classifier and created the tool named MARVEL (Metagenomic Analyses and Retrieval of Viral Elements), which is able to efficiently predict phage genomic sequences in bins generated from whole community metagenomic short reads. We also developed a toolkit, name vHULK (Viral Host Unveiling Kit), which can predict phages host given only their genome as input. vHULK presents higher accuracy than available tools and it can predict both host species and genus in a multiclass prediction setting. Data generated by the application of both tools in public and private composting metagenomic datasets is used for recovery, annotation, and characterization of phage diversity in composting environments. Both tools are publicly available through a GitHub repository: https://github.com/LaboratorioBioinformatica/.Vírus ambientais são extremamente diversos e abundantes na biosfera. Estudos têm demostrado que vírus que infectam procariotos (ou simplesmente fagos) são determinantes no direcionamento de ciclos biogeoquímicos em oceanos, além de influenciarem de forma significativa a diversificação de seus hospedeiros. Sem considerar esse papel ecológico, fagos também estão sendo utilizados para propósitos clínicos graças à habilidade de infectar bactérias e terminar infecções bacterianas. Um passo crucial para esta aplicação é o isolamento de fagos que tenham como alvo um determinado patógeno bacteriano de interesse. Para isso, pesquisadores geralmente recorrem a amostras ambientais num processo dispendioso de tentativa e erro de isolamento experimental. Ter informações importantes sobre a diversidade de fagos em uma amostra, assim como potenciais hospedeiros poderia ajudar neste processo. Sendo assim, nesta tese nós propomos o desenvolvimento de um pipeline de bioinformática para recuperação de genomas de fagos de amostras ambientais, assim como para predição de hospedeiros desses genomas. Para atingir esse objetivo, nós treinamos um classificador random forest para diferenciação de sequências de fagos e o implementamos na ferramenta chamada de MARVEL. Nós também desenvolvemos a ferramenta chamada vHULK, que é capaz de predizer hospedeiros bacterianos dada a sequência do genoma do fago. Ambas as ferramentas apresentam alta acurácia e performance quando comparadas com o estado da arte em cada problema de predição. Resultados gerados pela aplicação das ferramentas desenvolvidas nesta tese em datasets metagenômicos de compostagem e solo são apresentados como uma prova de conceito e estudo de caso. Ambas as ferramentas encontram-se disponíveis no repositório público: https://github.com/LaboratorioBioinformatica/

    Novel virocell metabolic potential revealed in agricultural soils by virus‐enriched soil metagenome analysis

    No full text
    International audienceViruses are now recognized as important players in microbial dynamics and biogeochemical cycles in the oceans. Yet, compared with aquatic ecosystems, virus discovery in terrestrial ecosystems has been challenging partly due to the inherent complexity of soils. To expand our understanding of soil viruses and their putative contributions to soil microbial processes, we analysed metagenomes of community-level virus-enriched suspensions by tangential flow filtration obtained from two French agricultural soils. We found viral sequences representing a total of 239 viral operational taxonomic units that corresponded to 29.5% of the mapping reads in the metagenomic datasets. The analysis of their genomic sequences revealed novel virocell metabolic potential with implications to virus-host interactions, carbon cycling, plant-beneficial functions in the rhizosphere, horizontal gene transfer and other relevant microbial strategies applied to survive in soils

    Image_1_MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.pdf

    No full text
    <p>Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.</p

    Table_1_MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.XLSX

    No full text
    <p>Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.</p

    Table_2_MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.XLSX

    No full text
    <p>Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.</p

    Bacterial Diversification in the Light of the Interactions with Phages: The Genetic Symbionts and Their Role in Ecological Speciation

    No full text
    Phages have a major impact on microbial populations. In this work, we discuss how predation, transduction, lysogeny, and phage domestication lead to symbio-centric genomic interactions between bacteria and phages, ranging from antagonistic to mutualistic. Furthermore, these interactions influence bacterial diversification and ecotype formation. We then propose an additional consideration in the form of a symbio-centric ecological speciation framework for bacteria. Our framework builds upon classical morphological and molecular taxonomy by also considering bacteria and their phages as a unit of evolutionary selection. This framework acknowledges the considerable effect that phage interaction has on bacterial genomic content, regulation, and evolution, and will advance our understanding of bacterial evolution

    Nucleocapsid single point-mutation associated with drop-out on RT-PCR assay for SARS-CoV-2 detection

    No full text
    Abstract Background Since its beginning, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been a challenge for clinical and molecular diagnostics, because it has been caused by a novel viral agent. Whole-genome sequencing assisted in the characterization and classification of SARS-CoV-2, and it is an essential tool to genomic surveillance aiming to identify potentials hot spots that could impact on vaccine immune response and on virus diagnosis. We describe two cases of failure at the N2 target of the RT-PCR test Xpert® Xpress SARS-CoV-2. Methods Total nucleic acid from the Nasopharyngeal (NP) and oropharyngeal (OP) swab samples and cell supernatant isolates were obtained. RNA samples were submitted to random amplification. Raw sequencing data were subjected to sequence quality controls, removal of human contaminants by aligning against the HG19 reference genome, taxonomic identification of other pathogens and genome recovery through assembly and manual curation. RT-PCR test Xpert® Xpress SARS-CoV-2 was used for molecular diagnosis of SARS-CoV-2 infection, samples were tested in duplicates. Results We identified 27 samples positive for SARS-CoV-2 with a nucleocapsid (N) gene drop out on Cepheid Xpert® Xpress SARS-CoV-2 assay. Sequencing of 2 of 27 samples revealed a single common mutation in the N gene C29197T, potentially involved in the failed detection of N target. Conclusions This study highlights the importance of genomic data to update molecular tests and vaccines
    corecore