7 research outputs found
Editorial : Curriculum Applications in Microbiology: Bioinformatics in the Classroom
We would like to thank all of the authors who submitted to this special topic, committed to the furthering of academic creativity, excellence, and rigor in the challenging and virtual instructional world of SARS-CoV-2 (COVID-19). To you and all of our educators globally, you are indispensable.Non peer reviewedPublisher PD
Sex-Specific Linkages Between Taxonomic and Functional Profiles of Tick Gut Microbiomes
Ticks transmit the most diverse array of disease agents and harbor one of the most diverse microbial communities. Major progress has been made in the characterization of the taxonomic profiles of tick microbiota. However, the functional profiles of tick microbiome have been comparatively less studied. In this proof of concept we used state-of-the-art functional metagenomics analytical tools to explore previously reported datasets of bacteria found in male and female Ixodes ovatus, Ixodes persulcatus, and Amblyomma variegatum. Results showed that both taxonomic and functional profiles have differences between sexes of the same species. KEGG pathway analysis revealed that male and female of the same species had major differences in the abundance of genes involved in different metabolic pathways including vitamin B, amino acids, carbohydrates, nucleotides, and antibiotics among others. Partial reconstruction of metabolic pathways using KEGG enzymes suggests that tick microbiome form a complex metabolic network that may increase microbial community resilience and adaptability. Linkage analysis between taxonomic and functional profiles showed that among the KEGG enzymes with differential abundance in male and female ticks only 12% were present in single bacterial genera. The rest of these enzymes were found in more than two bacterial genera, and 27% of them were found in five up to ten bacterial genera. Comparison of bacterial genera contributing to the differences in the taxonomic and functional profiles of males and females revealed that while a small group of bacteria has a dual-role, most of the bacteria contribute only to functional or taxonomic differentiation between sexes. Results suggest that the different life styles of male and female ticks exert sex-specific evolutionary pressures that act independently on the phenomes (set of phenotypes) and genomes of bacteria in tick gut microbiota. We conclude that functional redundancy is a fundamental property of male and female tick microbiota and propose that functional metagenomics should be combined with taxonomic profiling of microbiota because both analyses are complementary
Unraveling the functional dark matter through global metagenomics
30 pages, 4 figures, 1 table, supplementary information https://doi.org/10.1038/s41586-023-06583-7.-- Data availability: All of the analysed datasets along with their corresponding sequences are available from the IMG system (http://img.jgi.doe.gov/). A list of the datasets used in this study is provided in Supplementary Data 8. All data from the protein clusters, including sequences, multiple alignments, HMM profiles, 3D structure models, and taxonomic and ecosystem annotation, are available through NMPFamsDB, publicly accessible at www.nmpfamsdb.org. The 3D models are also available at ModelArchive under accession code ma-nmpfamsdb.-- Code availability: Sequence analysis was performed using Tantan (https://gitlab.com/mcfrith/tantan), BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), LAST (https://gitlab.com/mcfrith/last), HMMER (http://hmmer.org/) and HH-suite3 (https://github.com/soedinglab/hh-suite). Clustering was performed using HipMCL (https://bitbucket.org/azadcse/hipmcl/src/master/). Additional taxonomic annotation was performed using Whokaryote (https://github.com/LottePronk/whokaryote), EukRep (https://github.com/patrickwest/EukRep), DeepVirFinder (https://github.com/jessieren/DeepVirFinder) and MMseqs2 (https://github.com/soedinglab/MMseqs2). 3D modelling was performed using AlphaFold2 (https://github.com/deepmind/alphafold) and TrRosetta2 (https://github.com/RosettaCommons/trRosetta2). Structural alignments were performed using TMalign (https://zhanggroup.org/TM-align/) and MMalign (https://zhanggroup.org/MM-align/). All custom scripts used for the generation and analysis of the data are available at Zenodo (https://doi.org/10.5281/zenodo.8097349)Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matterWith the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe
Creación de un pipeline para análisis de datos metagenómicos basado en Nextflow
[ES] La metagenómica es una disciplina que surge ante la imposibilidad de aislar y cultivar la (amplia) mayoría de los organismos microbianos que viven en los ecosistemas naturales. El término hace referencia al estudio del conjunto de genomas de los organismos que habitan una muestra natural, al fin de describir, quiénes son y cuáles son sus potencialidades genéticas. Gracias a los considerables avances en el campo de la biología molecular, las nuevas tecnologías de secuenciación masiva (Next Generation Sequencing, NGS) y la bioinformática, la metagenómica se postula como una de las disciplinas con mayor progreso, en la actualidad, en el campo de la ecología microbiana.
Una parte fundamental del análisis metagenómico son las herramientas bioinformáticas empleadas para analizar los resultados de secuenciación; aunque esta disciplina puede considerarse bastante reciente, existen ya una gran variedad de bases de datos, pipelines (flujo de trabajo) y programas. Con respecto a los pipelines, estos se desarrollan para la integración de diversos paquetes de software complementarios y para la automatización de los procesos; no obstante, al aumentar la complejidad del análisis aparecen obstáculos que dificultan el uso de pipelines de alto rendimiento: incompatibilidad entre los diferentes paquetes de software, requisitos de actualización contradictorios, gestión de un elevado número de archivos intermedios y temporales u optimización de los recursos de computación. A fin de solventar todos estos problemas, recientemente ha surgido un lenguaje específico de dominio (DLS) llamado Nextflow. Este permite la adaptación de pipelines escritos en cualquier lenguaje de programación. La elección de Nextflow para el desarrollo de un pipeline para análisis metagenómicos se justifica por características como el uso de tecnologías de contenedores multi-escala, su integración en repositorios de software, la paralelización y la definición de canales de entrada y salida para el inicio automático de cada sub-proceso. Además, el modelo de flujo de datos mejora a otras herramientas alternativas; ya que, el procesamiento ¿top-down¿ no necesita gran espacio de almacenamiento. En definitiva, Nextflow se presenta como una solución flexible y robusta debido a la simplificación, al control del flujo de datos y a la gestión de los resultados que se recogen de los análisis metagenómicos.
En esta tesis, se presenta un pipeline de análisis basado en Nextflow que recoge los pasos principales del análisis metagenómico. El pipeline propuesto proporciona al usuario una herramienta que evita el proceso de instalación de los programas necesarios en el análisis y ejecuta los pasos más dispendiosos desde el punto de vista comunicacional en el análisis metagenómico. El pipeline recibe como entrada los datos brutos a analizar, procedentes de una o más muestras y continua con los pasos de control de calidad, ensamblado, anotación y cuantificación de cada anotación. Finalmente, proporciona, de forma clara y resumida, los resultados necesarios para los siguientes análisis estadísticos descriptivos y/o diferenciales; actuando de forma transparente, robusta y altamente reproducible, acelerando los tiempos de ejecución, ahorrando espacio de almacenamiento para el análisis y optimizando así los recursos informáticos disponibles.[EN] Metagenomics is a discipline that arose from the impossibility of isolating and cultivating most of microbial organisms that live in natural ecosystems. The term refers to the study of the genomes of the organisms that inhabit a natural sample in order to describe which microorganisms they are and what their genetic potentialities are. Because of the considerable advances in the field of molecular biology, Next Generation Sequencing technologies (NGS) and bioinformatics, metagenomics is currently one of the most advanced disciplines in the field of microbial ecology.
The analysis of sequencing results is performed by bioinformatic tools that are an essential part of metagenomic analysis; although this discipline may be considered quite recent, a wide variety of databases, pipelines (workflows) and programs have been developed. Pipelines are used for the integration of different and complementary software packages, and for the automation of processes. However, as the complexity of the analysis increases, obstacles which hinder the use of high-performance pipelines appear: incompatibility between different software packages, conflicting upgrade requirements, management of a large number of intermediate and temporary files, and optimization of computing resources. In order to overcome all these problems, a domain-specific language (DLS) called Nextflow has recently emerged. Nextflow allows the adaptation of pipelines written in any programming language. The choice of Nextflow for the development of a metagenomic analysis pipeline is justified by features such as the use of multi-scale containerization, its integration with software repositories, parallelization and the definition of input and output channels for the automatic start of each sub-process. In addition, the data flow model improves other alternative tools, because the top to bottom processing does not require large storage space. In conclusion, Nextflow stands to be a flexible and robust solution owing to the simplification, data flow control and management of the results collected from metagenomic analysis.
In this thesis, a Nextflow-based analysis pipeline, that captures the main steps of metagenomic analysis, is presented. The workflow provides the user with a tool that avoids the process of installing the necessary programs for the analysis and executes the most complex and communicatively problematic stages in metagenomic analysis. The pipeline receives as input the raw data to be analyzed from one or more samples and it continues with the steps of quality control, assembly, annotation, and quantification of each annotation. Finally, it provides, in a clear and summarized way, the necessary results for the following descriptive and/or differential statistical analysis, acting in a transparent, robust and highly reproducible way, speeding up execution times, saving storage space for the analysis, and optimizing the available informatic resources.Lomas Redondo, A. (2021). Creación de un pipeline para análisis de datos metagenómicos basado en Nextflow. Universitat Politècnica de València. http://hdl.handle.net/10251/171195TFG
Recommended from our members
Microbial Life in Challenging Environments
Microorganisms are nearly ubiquitous on Earth, but the identity and function of microbial communities are inherently dependent on the properties of the specific environment in question. Here, I have studied soils around the world to answer questions about how the functional attributes of microorganisms allow them to respond to challenging environmental conditions. First, I explore how microbial communities in soils change across environmental gradients in Antarctica. I show that microbes in Antarctic surface soils are most restricted by low temperatures, low water availability, and high concentrations of salt. Microbial communities near the polar plateau, the most challenging environment, are dominated by Actinobacteria and Chloroflexi, and are enriched in genes associated with the oxidation of hydrogen gas as an energy source. Second, I show that the earliest microbial colonizers of a newly-formed volcanic island in the Kingdom of Tonga are chemolithotrophs that appear to have come from nearby geothermal systems. While many of these microbes utilize sulfur as an energy source, the most abundant organisms have genes that indicate they can oxidize trace gases including carbon monoxide and hydrogen. Finally, I show that organisms associated with carbon limited subsurface soils tend to have smaller genomes, grow more slowly, and have more gene pathways associated with metabolism and the storage of carbon. Taken together, these studies shed light on microbial survival in challenging soil environments and show the varied ways in which microbial communities interact with and are affected by their surroundings.</p