162 research outputs found

    Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

    Get PDF

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    K-MER ANALYSIS PIPELINE FOR CLASSIFICATION OF DNA SEQUENCES FROM METAGENOMIC SAMPLES

    Get PDF
    Biological sequence datasets are increasing at a prodigious rate. The volume of data in these datasets surpasses what is observed in many other fields of science. New developments wherein metagenomic DNA from complex bacterial communities is recovered and sequenced are producing a new kind of data known as metagenomic data, which is comprised of DNA fragments from many genomes. Developing a utility to analyze such metagenomic data and predict the sample class from which it originated has many possible implications for ecological and medical applications. Within this document is a description of a series of analytical techniques used to process metagenomic data in such a way that it is transformed from the raw sequence information into a reusable data structure that can be processed by feature selection techniques and machine learning algorithms. Analysis and transformation of the data from the raw sequences to a reusable structure is done using k length substrings of DNA, known as k-mers, and storing the count of these observed strings in a Numeric Summarization Vector (NSV). The technique described herein is offered as a proof of concept for research into analyzing metagenomic data without identifying individual organisms contained within the sample. It is tested using leave-one-out and Monte Carlo cross-validation, while varying numerous parameters and verifying the results by using a large pool of independent experiments initiated with the same starting parameters. The pipeline is validated against multiple data sets using two- and three-class problems. Results are presented showing the accuracy as a function of multiple parameters that can be selected by a user of the pipeline. This work shows that there may be a way to process metagenomic data in near real time to analyze and predict the environmental class of a sample with reasonable accuracy. Consider the difficulty in distinguishing the difference between a healthy and diseased gut microbiome, this approach can classify sample data as belonging to one of those states

    A Primer on Metagenomics

    Get PDF
    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Microbial community functioning at hypoxic sediments revealed by targeted metagenomics and RNA stable isotope probing

    Get PDF
    Microorganisms are instrumental to the structure and functioning of marine ecosystems and to the chemistry of the ocean due to their essential part in the cycling of the elements and in the recycling of the organic matter. Two of the most critical ocean biogeochemical cycles are those of nitrogen and sulfur, since they can influence the synthesis of nucleic acids and proteins, primary productivity and microbial community structure. Oxygen concentration in marine environments is one of the environmental variables that have been largely affected by anthropogenic activities; its decline induces hypoxic events which affect benthic organisms and fisheries. Hypoxia has been traditionally defined based on the level of oxygen below which most animal life cannot be sustained. Hypoxic conditions impact microbial composition and activity since anaerobic reactions and pathways are favoured, at the expense of the aerobic ones. Naturally occurring hypoxia can be found in areas where water circulation is restricted, such as coastal lagoons, and in areas where oxygen-depleted water is driven into the continental shelf, i.e. coastal upwelling regions. Coastal lagoons are highly dynamic aquatic systems, particularly vulnerable to human activities and susceptible to changes induced by natural events. For the purpose of this PhD project, the lagoonal complex of Amvrakikos Gulf, one of the largest semi-enclosed gulfs in the Mediterranean Sea, was chosen as a study site. Coastal upwelling regions are another type of environment limited in oxygen, where also formation of oxygen minimum zones (OMZs) has been reported. Sediment in upwelling regions is rich in organic matter and bottom water is often depleted of oxygen because of intense heterotrophic respiration. For the purpose of this PhD project, the chosen coastal upwelling system was the Benguela system off Namibia, situated along the coast of south western Africa. The aim of this PhD project was to study the microbial community assemblages of hypoxic ecosystems and to identify a potential link between their identity and function, with a particular emphasis on the microorganisms involved in the nitrogen and sulfur cycles. The methodology that was applied included targeted metagenomics and RNA stable isotope probing (SIP). It has been shown that the microbial community diversity pattern can be differentiated based on habitat type, i.e. between riverine, lagoonal and marine environments. Moreover, the studied habitats were functionally distinctive. Apart from salinity, which was the abiotic variable best correlated with the microbial community pattern, oxygen concentration was highly correlated with the predicted metabolic pattern of the microbial communities. In addition, when the total number of Operational Taxonomic Units (OTUs) was taken into consideration, a negative linear relationship with salinity was identified (see Chapter 2). Microbial community diversity patterns can also be differentiated based on the lagoon under study since each lagoon hosts a different sulfate-reducing microbial (SRM) community, again highly correlated with salinity. Moreover, the majority of environmental terms that characterized the SRM communities were classified to the marine biome, but terms belonging to the freshwater or brackish biomes were also found in stations were a freshwater effect was more evident (see Chapter 3). Taxonomic groups that were expected to be thriving in the sediments of the Benguela coastal upwelling system were absent or present but in very low abundances. Epsilonproteobacteria dominated the anaerobic assimilation of acetate as confirmed by their isotopic enrichment in the SIP experiments. Enhancement of known sulfate-reducers was not achieved under sulfate addition, possibly due to competition for electron donors among nitrate-reducers and sulfate-reducers, to the inability of certain sulfate-reducing bacteria to use acetate as electron donor or to the short duration of the incubations (see Chapter 4). Future research should focus more on the community functioning of such habitats; an increased understanding of the biogeochemical cycles that characterize these hypoxic ecosystems will perhaps allow for predictions regarding the intensity and direction of the cycling of elements, especially of nitrogen and sulfur given their biological importance. Regulation of hypoxic episodes will aid the end-users of these ecosystems to possibly achieve higher productivity, in terms of fish catches, which otherwise is largely compromised by the elevated hydrogen sulfide concentrations

    Microbial community functioning at hypoxic sediments revealed by targeted metagenomics and RNA stable isotope probing

    Get PDF
    Microorganisms are instrumental to the structure and functioning of marine ecosystems and to the chemistry of the ocean due to their essential part in the cycling of the elements and in the recycling of the organic matter. Two of the most critical ocean biogeochemical cycles are those of nitrogen and sulfur, since they can influence the synthesis of nucleic acids and proteins, primary productivity and microbial community structure. Oxygen concentration in marine environments is one of the environmental variables that have been largely affected by anthropogenic activities; its decline induces hypoxic events which affect benthic organisms and fisheries. Hypoxia has been traditionally defined based on the level of oxygen below which most animal life cannot be sustained. Hypoxic conditions impact microbial composition and activity since anaerobic reactions and pathways are favoured, at the expense of the aerobic ones. Naturally occurring hypoxia can be found in areas where water circulation is restricted, such as coastal lagoons, and in areas where oxygen-depleted water is driven into the continental shelf, i.e. coastal upwelling regions. Coastal lagoons are highly dynamic aquatic systems, particularly vulnerable to human activities and susceptible to changes induced by natural events. For the purpose of this PhD project, the lagoonal complex of Amvrakikos Gulf, one of the largest semi-enclosed gulfs in the Mediterranean Sea, was chosen as a study site. Coastal upwelling regions are another type of environment limited in oxygen, where also formation of oxygen minimum zones (OMZs) has been reported. Sediment in upwelling regions is rich in organic matter and bottom water is often depleted of oxygen because of intense heterotrophic respiration. For the purpose of this PhD project, the chosen coastal upwelling system was the Benguela system off Namibia, situated along the coast of south western Africa. The aim of this PhD project was to study the microbial community assemblages of hypoxic ecosystems and to identify a potential link between their identity and function, with a particular emphasis on the microorganisms involved in the nitrogen and sulfur cycles. The methodology that was applied included targeted metagenomics and RNA stable isotope probing (SIP). It has been shown that the microbial community diversity pattern can be differentiated based on habitat type, i.e. between riverine, lagoonal and marine environments. Moreover, the studied habitats were functionally distinctive. Apart from salinity, which was the abiotic variable best correlated with the microbial community pattern, oxygen concentration was highly correlated with the predicted metabolic pattern of the microbial communities. In addition, when the total number of Operational Taxonomic Units (OTUs) was taken into consideration, a negative linear relationship with salinity was identified (see Chapter 2). Microbial community diversity patterns can also be differentiated based on the lagoon under study since each lagoon hosts a different sulfate-reducing microbial (SRM) community, again highly correlated with salinity. Moreover, the majority of environmental terms that characterized the SRM communities were classified to the marine biome, but terms belonging to the freshwater or brackish biomes were also found in stations were a freshwater effect was more evident (see Chapter 3). Taxonomic groups that were expected to be thriving in the sediments of the Benguela coastal upwelling system were absent or present but in very low abundances. Epsilonproteobacteria dominated the anaerobic assimilation of acetate as confirmed by their isotopic enrichment in the SIP experiments. Enhancement of known sulfate-reducers was not achieved under sulfate addition, possibly due to competition for electron donors among nitrate-reducers and sulfate-reducers, to the inability of certain sulfate-reducing bacteria to use acetate as electron donor or to the short duration of the incubations (see Chapter 4). Future research should focus more on the community functioning of such habitats; an increased understanding of the biogeochemical cycles that characterize these hypoxic ecosystems will perhaps allow for predictions regarding the intensity and direction of the cycling of elements, especially of nitrogen and sulfur given their biological importance. Regulation of hypoxic episodes will aid the end-users of these ecosystems to possibly achieve higher productivity, in terms of fish catches, which otherwise is largely compromised by the elevated hydrogen sulfide concentrations

    Biological investigation and predictive modelling of foaming in anaerobic digester

    Get PDF
    Anaerobic digestion (AD) of waste has been identified as a leading technology for greener renewable energy generation as an alternative to fossil fuel. AD will reduce waste through biochemical processes, converting it to biogas which could be used as a source of renewable energy and the residue bio-solids utilised in enriching the soil. A problem with AD though is with its foaming and the associated biogas loss. Tackling this problem effectively requires identifying and effectively controlling factors that trigger and promote foaming. In this research, laboratory experiments were initially carried out to differentiate foaming causal and exacerbating factors. Then the impact of the identified causal factors (organic loading rate-OLR and volatile fatty acid-VFA) on foaming occurrence were monitored and recorded. Further analysis of foaming and nonfoaming sludge samples by metabolomics techniques confirmed that the OLR and VFA are the prime causes of foaming occurrence in AD. In addition, the metagenomics analysis showed that the phylum bacteroidetes and proteobacteria were found to be predominant with a higher relative abundance of 30% and 29% respectively while the phylum actinobacteria representing the most prominent filamentous foam causing bacteria such as Norcadia amarae and Microthrix Parvicella had a very low and consistent relative abundance of 0.9% indicating that the foaming occurrence in the AD studied was not triggered by the presence of filamentous bacteria. Consequently, data driven models to predict foam formation were developed based on experimental data with inputs (OLR and VFA in the feed) and output (foaming occurrence). The models were extensively validated and assessed based on the mean squared error (MSE), root mean squared error (RMSE), R2 and mean absolute error (MAE). Levenberg Marquadt neural network model proved to be the best model for foaming prediction in AD, with RMSE = 5.49, MSE = 30.19 and R2 = 0.9435. The significance of this study is the development of a parsimonious and effective modelling tool that enable AD operators to proactively avert foaming occurrence, as the two model input variables (OLR and VFA) can be easily adjustable through simple programmable logic controller
    corecore