8 research outputs found

    NCBI's Virus Discovery Hackathon:Engaging Research Communities to Identify Cloud Infrastructure Requirements

    Get PDF
    A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon

    Pipeline design for user-friendly viral metagenomic analysis

    No full text
    "Cada vez se requieren más herramientas bioinformáticas y recursos computacionales para entender los amplios conjuntos de datos producidos por las tecnologías NGS. La curva de aprendizaje del uso de herramientas de bioinformática viene acompañada por sentimientos de frustración, quedándose así con opciones limitadas para avanzar en los análisis de sus datos. Así, el objetivo de este proyecto fue crear un software de fácil uso para el análisis de datos de metagenómica viral. Esta herramienta, Mosaic, está diseñada para el análisis de muestras enriquecidas por virus. Se basa en el uso de Snakemake como sistema de gestión de flujo de trabajo, junto con conda para la instalación de las dependencias requeridas. Así, este software utiliza datos de secuenciación crudos para generar la tabla de abundancia viral (vOTU). Además, este software de código abierto es compatible con lecturas largas, en concordancia con su prometedor futuro en el análisis metagenómico." -- Tomado del Formato de Documento de Grado."Biodiversity is a key determinant on community and ecosystem dynamics and functioning. In particular, microbial communities are important players on biogeochemical cycling and influence the development, growth, and health of their hosts. During the last decade, viruses, most notably bacteriophages, received recognition for their contribution to intestinal microbial ecology, and their role in marine ecosystems and food webs. However, their study in a community context is still challenging. There are no known shared genes and only a small fraction of viral hosts can be cultivated in the laboratory, which makes Whole Genome Shotgun coupled with bioinformatics, the most suitable approach to gain insights into uncultured viral communities. High Throughput Sequencing technologies started to be widely used by 2008, and the data in public databases have started to grow exponentially. Only a decade has passed and nowadays there is a strong requirement for bioinformatics skills." -- Tomado del Formato de Documento de Grado.Magíster en Biología ComputacionalMaestrí

    Exploring the Remarkable Diversity of Culturable <i>Escherichia coli</i> Phages in the Danish Wastewater Environment

    No full text
    Phages drive bacterial diversity, profoundly influencing microbial communities, from microbiomes to the drivers of global biogeochemical cycling. Aiming to broaden our understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples and isolated 136 phages. Ninety-two of these have genomic sequences with less than 95% similarity to known phages, while most map to existing genera several represent novel lineages. The isolated phages are highly diverse, estimated to represent roughly one-third of the true diversity of culturable virulent dsDNA Escherichia phages in Danish wastewater, yet almost half (40%) are not represented in metagenomic databases, emphasising the importance of isolating phages to uncover diversity. Seven viral families, Myoviridae, Siphoviridae, Podoviridae, Drexlerviridae, Chaseviridae, Autographviridae, and Microviridae, are represented in the dataset. Their genomes vary drastically in length from 5.3 kb to 170.8 kb, with a guanine and cytosine (GC) content ranging from 35.3% to 60.0%. Hence, even for a model host bacterium, substantial diversity remains to be uncovered. These results expand and underline the range of coliphage diversity and demonstrate how far we are from fully disclosing phage diversity and ecology

    Bacteriophages Roam the Wheat Phyllosphere

    No full text
    The phyllosphere microbiome plays an important role in plant fitness. Recently, bacteriophages have been shown to play a role in shaping the bacterial community composition of the phyllosphere. However, no studies on the diversity and abundance of phyllosphere bacteriophage communities have been carried out until now. In this study, we extracted, sequenced, and characterized the dsDNA and ssDNA viral community from a phyllosphere for the first time. We sampled leaves from winter wheat (Triticum aestivum), where we identified a total of 876 virus operational taxonomic units (vOTUs), mostly predicted to be bacteriophages with a lytic lifestyle. Remarkably, 848 of these vOTUs corresponded to new viral species, and we estimated a minimum of 2.0 × 10(6) viral particles per leaf. These results suggest that the wheat phyllosphere harbors a large and active community of novel bacterial viruses. Phylloviruses have potential applications as biocontrol agents against phytopathogenic bacteria or as microbiome modulators to increase plant growth-promoting bacteria

    Metaviromes Reveal the Dynamics of Pseudomonas Host-Specific Phages Cultured and Uncultured by Plaque Assay

    No full text
    Isolating single phages using plaque assays is a laborious and time-consuming process. Whether single isolated phages are the most lyse-effective, the most abundant in viromes, or those with the highest ability to make plaques in solid media is not well known. With the increasing accessibility of high-throughput sequencing, metaviromics is often used to describe viruses in environmental samples. By extracting and sequencing metaviromes from organic waste with and without exposure to a host-of-interest, we show a host-related phage community’s shift, as well as identify the most enriched phages. Moreover, we isolated plaque-forming single phages using the same virome–host matrix to observe how enrichments in liquid media correspond to the metaviromic data. In this study, we observed a significant shift (p = 0.015) of the 47 identified putative Pseudomonas phages with a minimum twofold change above zero in read abundance when adding a Pseudomonas&nbsp;syringae DC3000 host. Surprisingly, it appears that only two out of five plaque-forming phages from the same organic waste sample, targeting the Pseudomonas strain, were highly abundant in the metavirome, while the other three were almost absent despite host exposure. Lastly, our sequencing results highlight how long reads from Oxford Nanopore elevates the assembly quality of metaviromes, compared to short reads alone
    corecore