43 research outputs found

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Metagenomics - a guide from sampling to data analysis

    Get PDF
    Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared

    Analysing Microbial Community Composition through Amplicon Sequencing: From Sampling to Hypothesis Testing

    Get PDF
    Microbial ecology as a scientific field is fundamentally driven by technological advance. The past decade's revolution in DNA sequencing cost and throughput has made it possible for most research groups to map microbial community composition in environments of interest. However, the computational and statistical methodology required to analyse this kind of data is often not part of the biologist training. In this review, we give a historical perspective on the use of sequencing data in microbial ecology and restate the current need for this method; but also highlight the major caveats with standard practices for handling these data, from sample collection and library preparation to statistical analysis. Further, we outline the main new analytical tools that have been developed in the past few years to bypass these caveats, as well as highlight the major requirements of common statistical practices and the extent to which they are applicable to microbial data. Besides delving into the meaning of select alpha- and beta-diversity measures, we give special consideration to techniques for finding the main drivers of community dissimilarity and for interaction network construction. While every project design has specific needs, this review should serve as a starting point for considering what options are available

    Microbial community functioning at hypoxic sediments revealed by targeted metagenomics and RNA stable isotope probing

    Get PDF
    Microorganisms are instrumental to the structure and functioning of marine ecosystems and to the chemistry of the ocean due to their essential part in the cycling of the elements and in the recycling of the organic matter. Two of the most critical ocean biogeochemical cycles are those of nitrogen and sulfur, since they can influence the synthesis of nucleic acids and proteins, primary productivity and microbial community structure. Oxygen concentration in marine environments is one of the environmental variables that have been largely affected by anthropogenic activities; its decline induces hypoxic events which affect benthic organisms and fisheries. Hypoxia has been traditionally defined based on the level of oxygen below which most animal life cannot be sustained. Hypoxic conditions impact microbial composition and activity since anaerobic reactions and pathways are favoured, at the expense of the aerobic ones. Naturally occurring hypoxia can be found in areas where water circulation is restricted, such as coastal lagoons, and in areas where oxygen-depleted water is driven into the continental shelf, i.e. coastal upwelling regions. Coastal lagoons are highly dynamic aquatic systems, particularly vulnerable to human activities and susceptible to changes induced by natural events. For the purpose of this PhD project, the lagoonal complex of Amvrakikos Gulf, one of the largest semi-enclosed gulfs in the Mediterranean Sea, was chosen as a study site. Coastal upwelling regions are another type of environment limited in oxygen, where also formation of oxygen minimum zones (OMZs) has been reported. Sediment in upwelling regions is rich in organic matter and bottom water is often depleted of oxygen because of intense heterotrophic respiration. For the purpose of this PhD project, the chosen coastal upwelling system was the Benguela system off Namibia, situated along the coast of south western Africa. The aim of this PhD project was to study the microbial community assemblages of hypoxic ecosystems and to identify a potential link between their identity and function, with a particular emphasis on the microorganisms involved in the nitrogen and sulfur cycles. The methodology that was applied included targeted metagenomics and RNA stable isotope probing (SIP). It has been shown that the microbial community diversity pattern can be differentiated based on habitat type, i.e. between riverine, lagoonal and marine environments. Moreover, the studied habitats were functionally distinctive. Apart from salinity, which was the abiotic variable best correlated with the microbial community pattern, oxygen concentration was highly correlated with the predicted metabolic pattern of the microbial communities. In addition, when the total number of Operational Taxonomic Units (OTUs) was taken into consideration, a negative linear relationship with salinity was identified (see Chapter 2). Microbial community diversity patterns can also be differentiated based on the lagoon under study since each lagoon hosts a different sulfate-reducing microbial (SRM) community, again highly correlated with salinity. Moreover, the majority of environmental terms that characterized the SRM communities were classified to the marine biome, but terms belonging to the freshwater or brackish biomes were also found in stations were a freshwater effect was more evident (see Chapter 3). Taxonomic groups that were expected to be thriving in the sediments of the Benguela coastal upwelling system were absent or present but in very low abundances. Epsilonproteobacteria dominated the anaerobic assimilation of acetate as confirmed by their isotopic enrichment in the SIP experiments. Enhancement of known sulfate-reducers was not achieved under sulfate addition, possibly due to competition for electron donors among nitrate-reducers and sulfate-reducers, to the inability of certain sulfate-reducing bacteria to use acetate as electron donor or to the short duration of the incubations (see Chapter 4). Future research should focus more on the community functioning of such habitats; an increased understanding of the biogeochemical cycles that characterize these hypoxic ecosystems will perhaps allow for predictions regarding the intensity and direction of the cycling of elements, especially of nitrogen and sulfur given their biological importance. Regulation of hypoxic episodes will aid the end-users of these ecosystems to possibly achieve higher productivity, in terms of fish catches, which otherwise is largely compromised by the elevated hydrogen sulfide concentrations

    Microbial community functioning at hypoxic sediments revealed by targeted metagenomics and RNA stable isotope probing

    Get PDF
    Microorganisms are instrumental to the structure and functioning of marine ecosystems and to the chemistry of the ocean due to their essential part in the cycling of the elements and in the recycling of the organic matter. Two of the most critical ocean biogeochemical cycles are those of nitrogen and sulfur, since they can influence the synthesis of nucleic acids and proteins, primary productivity and microbial community structure. Oxygen concentration in marine environments is one of the environmental variables that have been largely affected by anthropogenic activities; its decline induces hypoxic events which affect benthic organisms and fisheries. Hypoxia has been traditionally defined based on the level of oxygen below which most animal life cannot be sustained. Hypoxic conditions impact microbial composition and activity since anaerobic reactions and pathways are favoured, at the expense of the aerobic ones. Naturally occurring hypoxia can be found in areas where water circulation is restricted, such as coastal lagoons, and in areas where oxygen-depleted water is driven into the continental shelf, i.e. coastal upwelling regions. Coastal lagoons are highly dynamic aquatic systems, particularly vulnerable to human activities and susceptible to changes induced by natural events. For the purpose of this PhD project, the lagoonal complex of Amvrakikos Gulf, one of the largest semi-enclosed gulfs in the Mediterranean Sea, was chosen as a study site. Coastal upwelling regions are another type of environment limited in oxygen, where also formation of oxygen minimum zones (OMZs) has been reported. Sediment in upwelling regions is rich in organic matter and bottom water is often depleted of oxygen because of intense heterotrophic respiration. For the purpose of this PhD project, the chosen coastal upwelling system was the Benguela system off Namibia, situated along the coast of south western Africa. The aim of this PhD project was to study the microbial community assemblages of hypoxic ecosystems and to identify a potential link between their identity and function, with a particular emphasis on the microorganisms involved in the nitrogen and sulfur cycles. The methodology that was applied included targeted metagenomics and RNA stable isotope probing (SIP). It has been shown that the microbial community diversity pattern can be differentiated based on habitat type, i.e. between riverine, lagoonal and marine environments. Moreover, the studied habitats were functionally distinctive. Apart from salinity, which was the abiotic variable best correlated with the microbial community pattern, oxygen concentration was highly correlated with the predicted metabolic pattern of the microbial communities. In addition, when the total number of Operational Taxonomic Units (OTUs) was taken into consideration, a negative linear relationship with salinity was identified (see Chapter 2). Microbial community diversity patterns can also be differentiated based on the lagoon under study since each lagoon hosts a different sulfate-reducing microbial (SRM) community, again highly correlated with salinity. Moreover, the majority of environmental terms that characterized the SRM communities were classified to the marine biome, but terms belonging to the freshwater or brackish biomes were also found in stations were a freshwater effect was more evident (see Chapter 3). Taxonomic groups that were expected to be thriving in the sediments of the Benguela coastal upwelling system were absent or present but in very low abundances. Epsilonproteobacteria dominated the anaerobic assimilation of acetate as confirmed by their isotopic enrichment in the SIP experiments. Enhancement of known sulfate-reducers was not achieved under sulfate addition, possibly due to competition for electron donors among nitrate-reducers and sulfate-reducers, to the inability of certain sulfate-reducing bacteria to use acetate as electron donor or to the short duration of the incubations (see Chapter 4). Future research should focus more on the community functioning of such habitats; an increased understanding of the biogeochemical cycles that characterize these hypoxic ecosystems will perhaps allow for predictions regarding the intensity and direction of the cycling of elements, especially of nitrogen and sulfur given their biological importance. Regulation of hypoxic episodes will aid the end-users of these ecosystems to possibly achieve higher productivity, in terms of fish catches, which otherwise is largely compromised by the elevated hydrogen sulfide concentrations

    American Gut: An Open Platform For Citizen Science Microbiome Research

    Get PDF
    Copyright © 2018 McDonald et al. Although much work has linked the human microbiome to specific phenotypes and lifestyle variables, data from different projects have been challenging to integrate and the extent of microbial and molecular diversity in human stool remains unknown. Using standardized protocols from the Earth Microbiome Project and sample contributions from over 10,000 citizen-scientists, together with an open research network, we compare human microbiome specimens primarily from the United States, United Kingdom, and Australia to one another and to environmental samples. Our results show an unexpected range of beta-diversity in human stool microbiomes compared to environmental samples; demonstrate the utility of procedures for removing the effects of overgrowth during room-temperature shipping for revealing phenotype correlations; uncover new molecules and kinds of molecular communities in the human stool metabolome; and examine emergent associations among the microbiome, metabolome, and the diversity of plants that are consumed (rather than relying on reductive categorical variables such as veganism, which have little or no explanatory power). We also demonstrate the utility of the living data resource and cross-cohort comparison to confirm existing associations between the microbiome and psychiatric illness and to reveal the extent of microbiome change within one individual during surgery, providing a paradigm for open microbiome research and education. IMPORTANCE We show that a citizen science, self-selected cohort shipping samples through the mail at room temperature recaptures many known microbiome results from clinically collected cohorts and reveals new ones. Of particular interest is integrating n = 1 study data with the population data, showing that the extent of microbiome change after events such as surgery can exceed differences between distinct environmental biomes, and the effect of diverse plants in the diet, which we confirm with untargeted metabolomics on hundreds of samples

    Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D

    Get PDF
    With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses.Mit der Entwicklung von Techniken zur Bestimmung der Chromosomen-Konforma- tion wissen wir jetzt, dass Chromatin in einer dreidimensionalen (3D) Struktur in- nerhalb des Zellkerns gepackt ist. Änderungen in der 3D-Chromatin-Architektur sind bereits mit Krankheiten wie Krebs in Verbindung gebracht worden. Daher ist ein besseres VerstĂ€ndnis dieser 3D-Konformation von Interesse, um einen tieferen Einblick in die komplexen, vielschichtigen Regulationsmechanismen des Genoms zu ermöglichen. Die in dieser Dissertation beschriebene Arbeit konzentriert sich im Wesentlichen auf die Entwicklung und Anwendung interpretierbarer maschineller Lernmethoden zur Vorhersage und Analyse von weitreichenden genomischen Inter- aktionen aus Chromatin-Interaktionsexperimenten. Im ersten Teil zeigen wir, dass die genetische Sequenzinformation an den genomis- chen Loci prĂ€diktiv fĂŒr die weitreichenden Interaktionen eines bestimmten Locus von Interesse (LoI) ist. Zum Beispiel kann die genetische Sequenzinformation an und um Enhancer-Elemente helfen, vorherzusagen, ob diese mit einer Promotorregion von Interesse interagieren. Dies wird durch die Erstellung von String-Kernel-basierten Support Vector Klassifikationsmodellen zusammen mit zwei neuen, intuitiven Visual- isierungsmethoden erreicht. Diese Modelle deuten auf eine mögliche allgemeine Rolle von kurzen, repetitiven Sequenzmotiven (”tandem repeats”) in der dreidimensionalen Genomorganisation hin. Die Erkenntnisse aus diesen Modellen sind jedoch immer noch grobkörnig. Zu diesem Zweck haben wir die maschinelle Lernmethode CoMIK (fĂŒr Conformal Multi-Instance-Kernel) entwickelt, welche feiner aufgelöste Erkennt- nisse liefern kann. Beim Vergleich von Sequenzen mit variabler LĂ€nge in ĂŒberwachten Lernszenarien kann CoMIK nicht nur die fĂŒr die Klassifizierung wichtigen Merkmale identifizieren, sondern sie auch innerhalb der Sequenz lokalisieren. Diese genaue Identifizierung wichtiger Abschnitte der gesamten Sequenz kann dazu beitragen, de novo Einblick in jede Rolle zu gewinnen, die das dazwischen liegende Chromatin fĂŒr weitreichende Interaktionen spielt. Obwohl CoMIK hauptsĂ€chlich nur genetische Se- quenzinformationen verwendet, kann es gleichzeitig auch andere Informationsquellen nutzen, beispielsweise zahlreiche funktionellen Genomdaten sofern verfĂŒgbar. Der zweite Teil beschreibt unsere Pipeline pHDee fĂŒr die einfache Bearbeitung großer Mengen von 3D-Genomdaten. Wir haben die Pipeline zur Analyse von HiChIP- Experimenten zur Untersuchung von dreidimensionalen ArchitekturĂ€nderungen bei der seltenen Krebsart Ewing-Sarkom (EWS) verwendet, welche Jugendliche betrifft. Insbesondere werden HiChIP-Daten fĂŒr zwei experimentelle Bedingungen, Doxycyclin- behandelt und unbehandelt, und fĂŒr primĂ€re Tumorproben analysiert. Wir zeigen, dass pHDee die Verarbeitung und einfache Integration großer Mengen der 3D-Genomik- Datenanalyse zusammen mit anderen datenintensiven Bioinformatik-Analysen erle- ichtert
    corecore