7 research outputs found

    Editorial : Curriculum Applications in Microbiology: Bioinformatics in the Classroom

    Get PDF
    We would like to thank all of the authors who submitted to this special topic, committed to the furthering of academic creativity, excellence, and rigor in the challenging and virtual instructional world of SARS-CoV-2 (COVID-19). To you and all of our educators globally, you are indispensable.Non peer reviewedPublisher PD

    Sex-Specific Linkages Between Taxonomic and Functional Profiles of Tick Gut Microbiomes

    Get PDF
    Ticks transmit the most diverse array of disease agents and harbor one of the most diverse microbial communities. Major progress has been made in the characterization of the taxonomic profiles of tick microbiota. However, the functional profiles of tick microbiome have been comparatively less studied. In this proof of concept we used state-of-the-art functional metagenomics analytical tools to explore previously reported datasets of bacteria found in male and female Ixodes ovatus, Ixodes persulcatus, and Amblyomma variegatum. Results showed that both taxonomic and functional profiles have differences between sexes of the same species. KEGG pathway analysis revealed that male and female of the same species had major differences in the abundance of genes involved in different metabolic pathways including vitamin B, amino acids, carbohydrates, nucleotides, and antibiotics among others. Partial reconstruction of metabolic pathways using KEGG enzymes suggests that tick microbiome form a complex metabolic network that may increase microbial community resilience and adaptability. Linkage analysis between taxonomic and functional profiles showed that among the KEGG enzymes with differential abundance in male and female ticks only 12% were present in single bacterial genera. The rest of these enzymes were found in more than two bacterial genera, and 27% of them were found in five up to ten bacterial genera. Comparison of bacterial genera contributing to the differences in the taxonomic and functional profiles of males and females revealed that while a small group of bacteria has a dual-role, most of the bacteria contribute only to functional or taxonomic differentiation between sexes. Results suggest that the different life styles of male and female ticks exert sex-specific evolutionary pressures that act independently on the phenomes (set of phenotypes) and genomes of bacteria in tick gut microbiota. We conclude that functional redundancy is a fundamental property of male and female tick microbiota and propose that functional metagenomics should be combined with taxonomic profiling of microbiota because both analyses are complementary

    Unraveling the functional dark matter through global metagenomics

    Get PDF
    30 pages, 4 figures, 1 table, supplementary information https://doi.org/10.1038/s41586-023-06583-7.-- Data availability: All of the analysed datasets along with their corresponding sequences are available from the IMG system (http://img.jgi.doe.gov/). A list of the datasets used in this study is provided in Supplementary Data 8. All data from the protein clusters, including sequences, multiple alignments, HMM profiles, 3D structure models, and taxonomic and ecosystem annotation, are available through NMPFamsDB, publicly accessible at www.nmpfamsdb.org. The 3D models are also available at ModelArchive under accession code ma-nmpfamsdb.-- Code availability: Sequence analysis was performed using Tantan (https://gitlab.com/mcfrith/tantan), BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), LAST (https://gitlab.com/mcfrith/last), HMMER (http://hmmer.org/) and HH-suite3 (https://github.com/soedinglab/hh-suite). Clustering was performed using HipMCL (https://bitbucket.org/azadcse/hipmcl/src/master/). Additional taxonomic annotation was performed using Whokaryote (https://github.com/LottePronk/whokaryote), EukRep (https://github.com/patrickwest/EukRep), DeepVirFinder (https://github.com/jessieren/DeepVirFinder) and MMseqs2 (https://github.com/soedinglab/MMseqs2). 3D modelling was performed using AlphaFold2 (https://github.com/deepmind/alphafold) and TrRosetta2 (https://github.com/RosettaCommons/trRosetta2). Structural alignments were performed using TMalign (https://zhanggroup.org/TM-align/) and MMalign (https://zhanggroup.org/MM-align/). All custom scripts used for the generation and analysis of the data are available at Zenodo (https://doi.org/10.5281/zenodo.8097349)Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matterWith the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe

    Creación de un pipeline para análisis de datos metagenómicos basado en Nextflow

    Full text link
    [ES] La metagenómica es una disciplina que surge ante la imposibilidad de aislar y cultivar la (amplia) mayoría de los organismos microbianos que viven en los ecosistemas naturales. El término hace referencia al estudio del conjunto de genomas de los organismos que habitan una muestra natural, al fin de describir, quiénes son y cuáles son sus potencialidades genéticas. Gracias a los considerables avances en el campo de la biología molecular, las nuevas tecnologías de secuenciación masiva (Next Generation Sequencing, NGS) y la bioinformática, la metagenómica se postula como una de las disciplinas con mayor progreso, en la actualidad, en el campo de la ecología microbiana. Una parte fundamental del análisis metagenómico son las herramientas bioinformáticas empleadas para analizar los resultados de secuenciación; aunque esta disciplina puede considerarse bastante reciente, existen ya una gran variedad de bases de datos, pipelines (flujo de trabajo) y programas. Con respecto a los pipelines, estos se desarrollan para la integración de diversos paquetes de software complementarios y para la automatización de los procesos; no obstante, al aumentar la complejidad del análisis aparecen obstáculos que dificultan el uso de pipelines de alto rendimiento: incompatibilidad entre los diferentes paquetes de software, requisitos de actualización contradictorios, gestión de un elevado número de archivos intermedios y temporales u optimización de los recursos de computación. A fin de solventar todos estos problemas, recientemente ha surgido un lenguaje específico de dominio (DLS) llamado Nextflow. Este permite la adaptación de pipelines escritos en cualquier lenguaje de programación. La elección de Nextflow para el desarrollo de un pipeline para análisis metagenómicos se justifica por características como el uso de tecnologías de contenedores multi-escala, su integración en repositorios de software, la paralelización y la definición de canales de entrada y salida para el inicio automático de cada sub-proceso. Además, el modelo de flujo de datos mejora a otras herramientas alternativas; ya que, el procesamiento ¿top-down¿ no necesita gran espacio de almacenamiento. En definitiva, Nextflow se presenta como una solución flexible y robusta debido a la simplificación, al control del flujo de datos y a la gestión de los resultados que se recogen de los análisis metagenómicos. En esta tesis, se presenta un pipeline de análisis basado en Nextflow que recoge los pasos principales del análisis metagenómico. El pipeline propuesto proporciona al usuario una herramienta que evita el proceso de instalación de los programas necesarios en el análisis y ejecuta los pasos más dispendiosos desde el punto de vista comunicacional en el análisis metagenómico. El pipeline recibe como entrada los datos brutos a analizar, procedentes de una o más muestras y continua con los pasos de control de calidad, ensamblado, anotación y cuantificación de cada anotación. Finalmente, proporciona, de forma clara y resumida, los resultados necesarios para los siguientes análisis estadísticos descriptivos y/o diferenciales; actuando de forma transparente, robusta y altamente reproducible, acelerando los tiempos de ejecución, ahorrando espacio de almacenamiento para el análisis y optimizando así los recursos informáticos disponibles.[EN] Metagenomics is a discipline that arose from the impossibility of isolating and cultivating most of microbial organisms that live in natural ecosystems. The term refers to the study of the genomes of the organisms that inhabit a natural sample in order to describe which microorganisms they are and what their genetic potentialities are. Because of the considerable advances in the field of molecular biology, Next Generation Sequencing technologies (NGS) and bioinformatics, metagenomics is currently one of the most advanced disciplines in the field of microbial ecology. The analysis of sequencing results is performed by bioinformatic tools that are an essential part of metagenomic analysis; although this discipline may be considered quite recent, a wide variety of databases, pipelines (workflows) and programs have been developed. Pipelines are used for the integration of different and complementary software packages, and for the automation of processes. However, as the complexity of the analysis increases, obstacles which hinder the use of high-performance pipelines appear: incompatibility between different software packages, conflicting upgrade requirements, management of a large number of intermediate and temporary files, and optimization of computing resources. In order to overcome all these problems, a domain-specific language (DLS) called Nextflow has recently emerged. Nextflow allows the adaptation of pipelines written in any programming language. The choice of Nextflow for the development of a metagenomic analysis pipeline is justified by features such as the use of multi-scale containerization, its integration with software repositories, parallelization and the definition of input and output channels for the automatic start of each sub-process. In addition, the data flow model improves other alternative tools, because the top to bottom processing does not require large storage space. In conclusion, Nextflow stands to be a flexible and robust solution owing to the simplification, data flow control and management of the results collected from metagenomic analysis. In this thesis, a Nextflow-based analysis pipeline, that captures the main steps of metagenomic analysis, is presented. The workflow provides the user with a tool that avoids the process of installing the necessary programs for the analysis and executes the most complex and communicatively problematic stages in metagenomic analysis. The pipeline receives as input the raw data to be analyzed from one or more samples and it continues with the steps of quality control, assembly, annotation, and quantification of each annotation. Finally, it provides, in a clear and summarized way, the necessary results for the following descriptive and/or differential statistical analysis, acting in a transparent, robust and highly reproducible way, speeding up execution times, saving storage space for the analysis, and optimizing the available informatic resources.Lomas Redondo, A. (2021). Creación de un pipeline para análisis de datos metagenómicos basado en Nextflow. Universitat Politècnica de València. http://hdl.handle.net/10251/171195TFG
    corecore