46 research outputs found

    Development of an integrated omics in silico workflow and its application for studying bacteria-phage interactions in a model microbial community

    Get PDF
    Microbial communities are ubiquitous and dynamic systems that inhabit a multitude of environments. They underpin natural as well as biotechnological processes, and are also implicated in human health. The elucidation and understanding of these structurally and functionally complex microbial systems using a broad spectrum of toolkits ranging from in situ sampling, high-throughput data generation ("omics"), bioinformatic analyses, computational modelling and laboratory experiments is the aim of the emerging discipline of Eco-Systems Biology. Integrated workflows which allow the systematic investigation of microbial consortia are being developed. However, in silico methods for analysing multi-omic data sets are so far typically lab-specific, applied ad hoc, limited in terms of their reproducibility by different research groups and suboptimal in the amount of data actually being exploited. To address these limitations, the present work initially focused on the development of the Integrated Meta-omic Pipeline (IMP), a large-scale reference-independent bioinformatic analyses pipeline for the integrated analysis of coupled metagenomic and metatranscriptomic data. IMP is an elaborate pipeline that incorporates robust read preprocessing, iterative co-assembly, analyses of microbial community structure and function, automated binning as well as genomic signature-based visualizations. The IMP-based data integration strategy greatly enhances overall data usage, output volume and quality as demonstrated using relevant use-cases. Finally, IMP is encapsulated within a user-friendly implementation using Python while relying on Docker for reproducibility. The IMP pipeline was then applied to a longitudinal multi-omic dataset derived from a model microbial community from an activated sludge biological wastewater treatment plant with the explicit aim of following bacteria-phage interaction dynamics using information from the CRISPR-Cas system. This work provides a multi-omic perspective of community-level CRISPR dynamics, namely changes in CRISPR repeat and spacer complements over time, demonstrating that these are heterogeneous, dynamic and transcribed genomic regions. Population-level analysis of two lipid accumulating bacterial species associated with 158 putative bacteriophage sequences enabled the observation of phage-host population dynamics. Several putatively identified bacteriophages were found to occur at much higher abundances compared to other phages and these specific peaks usually do not overlap with other putative phages. In addition, there were several RNA-based CRISPR targets that were found to occur in high abundances. In summary, the present work describes the development of a new bioinformatic pipeline for the analysis of coupled metagenomic and metatranscriptomic datasets derived from microbial communities and its application to a study focused on the dynamics of bacteria-virus interactions. Finally, this work demonstrates the power of integrated multi-omic investigation of microbial consortia towards the conversion of high-throughput next-generation sequencing data into new insights

    Genomic sequencing capacity, data retention, and personal access to raw data in Europe

    Get PDF
    Whole genome/exome sequencing (WGS/WES) has become widely adopted in research and, more recently, in clinical settings. Many hope that the information obtained from the interpretation of these data will have medical benefits for patients and—in some cases—also their biological relatives. Because of the manifold possibilities to reuse genomic data, enabling sequenced individuals to access their own raw (uninterpreted) genomic data is a highly debated issue. This paper reports some of the first empirical findings on personal genome access policies and practices. We interviewed 39 respondents, working at 33 institutions in 21 countries across Europe. These sequencing institutions generate massive amounts of WGS/WES data and represent varying organisational structures and operational models. Taken together, in total, these institutions have sequenced ∼317,259 genomes and exomes to date. Most of the sequencing institutions reported that they are able to store raw genomic data in compliance with various national regulations, although there was a lack of standardisation of storage formats. Interviewees from 12 of the 33 institutions included in our study reported that they had received requests for personal access to raw genomic data from sequenced individuals. In the absence of policies on how to process such requests, these were decided on an ad hoc basis; in the end, at least 28 requests were granted, while there were no reports of requests being rejected. Given the rights, interests, and liabilities at stake, it is essential that sequencing institutions adopt clear policies and processes for raw genomic data retention and personal access

    Integrated meta-omic analyses of the gastrointestinal tract microbiome in patients undergoing allogeneic hematopoietic stem cell transplantation.

    Get PDF
    In patients undergoing allogeneic hematopoietic stem cell transplantation (allo-HSCT), treatment-induced changes to the gastrointestinal tract (GIT) microbiome have been linked to adverse outcomes, most notably graft-versus-host disease (GvHD). However, it is presently unknown whether this relationship is causal or consequential. Here, we performed an integrated meta-omic analysis to probe deeper into the GIT microbiome changes during allo-HSCT and its accompanying treatments. We used 16S and 18S rRNA gene amplicon sequencing to resolve archaea, bacteria, and eukaryotes within the GIT microbiomes of 16 patients undergoing allo-HSCT for the treatment of hematologic malignancies. These results revealed a major shift in the GIT microbiome after allo-HSCT including a marked reduction in bacterial diversity, accompanied by only limited changes in eukaryotes and archaea. An integrated analysis of metagenomic and metatranscriptomic data was performed on samples collected from a patient before and after allo-HSCT for acute myeloid leukemia. This patient developed severe GvHD, leading to death 9 months after allo-HSCT. In addition to drastically decreased bacterial diversity, the post-treatment microbiome showed a higher overall number and higher expression levels of antibiotic resistance genes (ARGs). One specific Escherichia coli strain causing a paravertebral abscess was linked to GIT dysbiosis, suggesting loss of intestinal barrier integrity. The apparent selection for bacteria expressing ARGs suggests that prophylactic antibiotic administration may adversely affect the overall treatment outcome. We therefore assert that such analyses including information about the selection of pathogenic bacteria expressing ARGs may assist clinicians in "personalizing" regimens for individual patients to improve overall outcomes

    Birth mode is associated with earliest strain-conferred gut microbiome functions and immunostimulatory potential

    Get PDF
    The effects of caesarean section delivery on mother-to-neonate transmission of microbiota are unclear. Here the authors show that caesarean section delivery can affect the transmission of specific microbial strains and the immunomodulatory potential of the microbiota

    Integration of time-series meta-omics data reveals how microbial ecosystems respond to disturbance.

    Get PDF
    The development of reliable, mixed-culture biotechnological processes hinges on understanding how microbial ecosystems respond to disturbances. Here we reveal extensive phenotypic plasticity and niche complementarity in oleaginous microbial populations from a biological wastewater treatment plant. We perform meta-omics analyses (metagenomics, metatranscriptomics, metaproteomics and metabolomics) on in situ samples over 14 months at weekly intervals. Based on 1,364 de novo metagenome-assembled genomes, we uncover four distinct fundamental niche types. Throughout the time-series, we observe a major, transient shift in community structure, coinciding with substrate availability changes. Functional omics data reveals extensive variation in gene expression and substrate usage amongst community members. Ex situ bioreactor experiments confirm that responses occur within five hours of a pulse disturbance, demonstrating rapid adaptation by specific populations. Our results show that community resistance and resilience are a function of phenotypic plasticity and niche complementarity, and set the foundation for future ecological engineering efforts

    De-novo assembly and finishing of the genome of neuro-toxin (anatoxin-a) producing cyanobacterium, Anabaena sp. strain 37

    Get PDF
    Cyanobacteria are ancient photosynthetic microorganisms found in both fresh and saline water bodies all over the world. Anabaena is a genus of filamentous heterocystous diazotrophic cyanobacteria that are common in freshwater lakes and often implicated in the formation of blooms. They are known to play a vital role in the nitrogen cycle and to produce harmful toxins. The reason for this toxic producing nature is still unknown. The Anabaena sp. strain 37, isolated from lake Sääksjärvi, western Finland was found to produce the neurotoxin, anatoxin-a which affects the nervous systems of humans and animals, capable of causing paralysis. During the past decade, genome sequencing has aided in the understanding of genetic information in many organisms including cyanobacteria. A whole genome sequencing project was carried out to understand the mechanism of anatoxin-a production in the Anabaena sp. strain 37. The 454 pyrosequencing produced 258,430 reads with a coverage of approximately 22X. The data was subjected to a de novo assembly which produced a draft genome, made up of 828 contigs above 500 bp, an N50 contig of 10,548 bp and a longest contig of 47,660 bp. The draft assembly underwent a finishing procedure which included scaffolding, gap closure and error correction. Two types of mate pair libraries; 3 Kb and 8 Kb were constructed and sequenced for scaffolding. The scaffolding using 196,221 of 3 Kb mate pair reads yielded 31 major scaffolds with an N50 scaffold of 344,872 bp. A second scaffolding using 34,498, 8 Kb mate pair reads resulted in 16 scaffolds, and an N50 scaffold of 1,085,340 bp. Three automated gap closure rounds were carried out using consed autofinish. The primers amplified the genomic DNA with PCR and the products were sequenced using Sanger sequencing. A total of 1,406 Sanger reads were used to closed more than 800 gaps in the draft assembly. In addition, the 454-based draft assembly contained many sequencing errors among single nucleotide homopolymeric regions of three-mers and above. Moreover, these errors were found in coding regions, namely the anatoxin-a synthetase gene cluster and was further confirmed with additional PCR and Sanger sequencing. There were 370,648 single nucleotide homopolymer sites of three mers and above that accounted for 38.18% of the genome length and a density of 668.1 per 10 Kb. A correction procedure was carried out by incorporating 100X coverage Illumina/Solexa data into the assembly. The high depth data corrected an estimated 1,888 single nucleotide homopolymer error sites of three-mers and above which translates to a 454 single nucleotide homopolymer error rate of 0.51% or 3.37 per 10 Kb. The correction also increased the overall quality of the Q20. The current assembly is made up of 14 scaffolds out of which six are major scaffolds. The assembly has an N50 scaffold of 1,085,340 bp where 99.7% of the consensus bases are of phred Q20 bases and an overall error rate of 8.21 per 10 Kb. Finally, the genome has a GC-content of 38.3% with four ribosomal RNA operons and the anatoxin-a synthetase gene cluster confirmed

    De-novo assembly and finishing of the genome of neuro-toxin (anatoxin-a) producing cyanobacterium, Anabaena sp. strain 37

    No full text
    Cyanobacteria are ancient photosynthetic microorganisms found in both fresh and saline water bodies all over the world. Anabaena is a genus of filamentous heterocystous diazotrophic cyanobacteria that are common in freshwater lakes and often implicated in the formation of blooms. They are known to play a vital role in the nitrogen cycle and to produce harmful toxins. The reason for this toxic producing nature is still unknown. The Anabaena sp. strain 37, isolated from lake Sääksjärvi, western Finland was found to produce the neurotoxin, anatoxin-a which affects the nervous systems of humans and animals, capable of causing paralysis. During the past decade, genome sequencing has aided in the understanding of genetic information in many organisms including cyanobacteria. A whole genome sequencing project was carried out to understand the mechanism of anatoxin-a production in the Anabaena sp. strain 37. The 454 pyrosequencing produced 258,430 reads with a coverage of approximately 22X. The data was subjected to a de novo assembly which produced a draft genome, made up of 828 contigs above 500 bp, an N50 contig of 10,548 bp and a longest contig of 47,660 bp. The draft assembly underwent a finishing procedure which included scaffolding, gap closure and error correction. Two types of mate pair libraries; 3 Kb and 8 Kb were constructed and sequenced for scaffolding. The scaffolding using 196,221 of 3 Kb mate pair reads yielded 31 major scaffolds with an N50 scaffold of 344,872 bp. A second scaffolding using 34,498, 8 Kb mate pair reads resulted in 16 scaffolds, and an N50 scaffold of 1,085,340 bp. Three automated gap closure rounds were carried out using consed autofinish. The primers amplified the genomic DNA with PCR and the products were sequenced using Sanger sequencing. A total of 1,406 Sanger reads were used to closed more than 800 gaps in the draft assembly. In addition, the 454-based draft assembly contained many sequencing errors among single nucleotide homopolymeric regions of three-mers and above. Moreover, these errors were found in coding regions, namely the anatoxin-a synthetase gene cluster and was further confirmed with additional PCR and Sanger sequencing. There were 370,648 single nucleotide homopolymer sites of three mers and above that accounted for 38.18% of the genome length and a density of 668.1 per 10 Kb. A correction procedure was carried out by incorporating 100X coverage Illumina/Solexa data into the assembly. The high depth data corrected an estimated 1,888 single nucleotide homopolymer error sites of three-mers and above which translates to a 454 single nucleotide homopolymer error rate of 0.51% or 3.37 per 10 Kb. The correction also increased the overall quality of the Q20. The current assembly is made up of 14 scaffolds out of which six are major scaffolds. The assembly has an N50 scaffold of 1,085,340 bp where 99.7% of the consensus bases are of phred Q20 bases and an overall error rate of 8.21 per 10 Kb. Finally, the genome has a GC-content of 38.3% with four ribosomal RNA operons and the anatoxin-a synthetase gene cluster confirmed

    IMP HTML reports

    No full text
    This file contains all the HTML reports generated by IMP for the analysis of datasets reported in the article

    Integrated omics for the identification of key functionalities in biological wastewater treatment microbial communities

    Get PDF
    Biological wastewater treatment plants harbour diverse and complex microbial communities which prominently serve as models for microbial ecology and mixed culture biotechnological processes. Integrated omic analyses (combined metagenomics, metatranscriptomics, metaproteomics and metabolomics) are currently gaining momentum towards providing enhanced understanding of community structure, function and dynamics in situ as well as offering the potential to discover novel biological functionalities within the framework of Eco-Systems Biology. The integration of information from genome to metabolome allows the establishment of associations between genetic potential and final phenotype, a feature not realizable by only considering single ‘omes’. Therefore, in our opinion, integrated omics will become the future standard for large-scale characterization of microbial consortia including those underpinning biological wastewater treatment processes. Systematically obtained time and space-resolved omic datasets will allow deconvolution of structure–function relationships by identifying key members and functions. Such knowledge will form the foundation for discovering novel genes on a much larger scale compared with previous efforts. In general, these insights will allow us to optimize microbial biotechnological processes either through better control of mixed culture processes or by use of more efficient enzymes in bioengineering applications
    corecore