12 research outputs found

    Clustering 16S rRNA for OTU prediction: A similarity based method

    Get PDF
    To study the phylogeny and taxonomy of samples from complex environments Next-generation sequencing (NGS)-based 16S rRNA sequencing , which has been successfully used  jointly with the PCR amplification and NGS technology. First step for many downstream analyses is clustering 16S rRNA sequences into operational taxonomic units (OTUs). Heuristic clustering is one of the most widely employed approaches for generating OTUs in which one or more seed sequences to represent each cluster are selected. In this work we chose five random seeds for each cluster from a genes library, and  we present a novel distance measure to cluster bacteria in the sample. Artificially created sets of 16S rRNA genes selected from databases are successfully clustered with more than %98 accuracy, sensitivity, and specificity

    Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species

    Get PDF
    Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models’ performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines’ performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375–400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next-generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false-positive and false-negative errors in metabarcoding data

    DMSC: A Dynamic Multi-Seeds Method for Clustering 16S rRNA Sequences Into OTUs

    Get PDF
    Next-generation sequencing (NGS)-based 16S rRNA sequencing by jointly using the PCR amplification and NGS technology is a cost-effective technique, which has been successfully used to study the phylogeny and taxonomy of samples from complex microbiomes or environments. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) is often the first step for many downstream analyses. Heuristic clustering is one of the most widely employed approaches for generating OTUs. However, most heuristic OTUs clustering methods just select one single seed sequence to represent each cluster, resulting in their outcomes suffer from either overestimation of OTUs number or sensitivity to sequencing errors. In this paper, we present a novel dynamic multi-seeds clustering method (namely DMSC) to pick OTUs. DMSC first heuristically generates clusters according to the distance threshold. When the size of a cluster reaches the pre-defined minimum size, then DMSC selects the multi-core sequences (MCS) as the seeds that are defined as the n-core sequences (n ≥ 3), in which the distance between any two sequences is less than the distance threshold. A new sequence is assigned to the corresponding cluster depending on the average distance to MCS and the distance standard deviation within the MCS. If a new sequence is added to the cluster, dynamically update the MCS until no sequence is merged into the cluster. The new method DMSC was tested on several simulated and real-life sequence datasets and also compared with the traditional heuristic methods such as CD-HIT, UCLUST, and DBH. Experimental results in terms of the inferred OTUs number, normalized mutual information (NMI) and Matthew correlation coefficient (MCC) metrics demonstrate that DMSC can produce higher quality clusters with low memory usage and reduce OTU overestimation. Additionally, DMSC is also robust to the sequencing errors. The DMSC software can be freely downloaded from https://github.com/NWPU-903PR/DMSC

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Host-parasite dynamics in a natural system

    Get PDF
    The Red Queen Hypothesis postulates that reciprocal selection arising from host-parasite interactions should accelerate evolutionary rates through the need for continual adaptation and counter-adaptation. A process driving such rapid reciprocal adaptation is referred to as negative frequency-dependent selection, in which the most common genotypes decrease over time because they have a higher probability of becoming infected by coevolving parasites. This proposed mechanism of host-parasite coevolution was commonly tested in laboratory experiments under controlled conditions. Regarding field investigations of natural populations, temporal changes in relative frequencies of genotypes were mostly tested for host only, because tracking parasite dynamics over time remained difficult. As parasite population dynamics are highly sensitive to environmental changes, studies under natural conditions are essential to understand host-parasite coevolution. The commonly explored model system to address coevolutionary questions are the water fleas of the genus Daphnia and their microparasites. In this PhD thesis, I analysed the population structure of two major microparasites of Daphnia: Caullerya mesnili (Chapters 2 and 3) and microsporidia (Chapter 4). First, in Chapter 2, I developed a new bioinformatic pipeline to analyse molecular data generated by next-generation-sequencing (NGS) platforms. C. mesnili populations from different water reservoirs in the Czech Republic were sequenced at the first internal transcribed spacer (ITS1) of the ribosomal gene cluster, analysed with this new pipeline and compared with published results from the same populations but using cloning and Sanger sequencing method. I detected that relative frequencies of C. mesnili ITS1 sequence types were similar when compared to other sequencing methods, thereby validating the bioinformatic pipeline, and showing the suitability of 454 platform to perform population biology analyses. After this validation, in Chapter 3, I analysed the population dynamics and host-genotype specificity of C. mesnili, in long-term samples collected from a single lake, and based on the sequence variations in the ITS1 region. I found that the most abundant C. mesnili ITS1 sequence type decreased, while rare sequences increased over the course of the study (4 years). The observed pattern is consistent with the negative frequency-dependent selection. However, only a weak signal of host-genotype specificity between C. mesnili and Daphnia genotypes was detected, which supports the lack of host-genotype specificity in this system. Finally, in Chapter 4, I described the patterns of geographical population structure, intraspecific genetic variation, and recombination of two Daphnia-infecting microsporidia: Berwaldia schaefernai and the unknown microsporidium MIC1. These patterns were used to predict the existence of secondary hosts in the life cycle of these microsporidia. I observed little variation among B. schaefernai parasite strains infecting different host populations; in contrast, there was significant genetic variation among populations of MIC1. Additionally, ITS genetic diversity was lower in B. schaefernai than in MIC1. These findings suggest that the presumed secondary host for B. schaefernai is expected to be mobile, while in MIC1 the secondary host (if exists) does not appear to facilitate dispersal to the same degree. Finally, recombination analyses indicated cryptic sex in B. schaefernai and pure asexuality in MIC1. All these findings enable a more comprehensive understanding of the biology of Daphnia-infecting microparasites and the genetic basis of Daphnia-microparasites coevolution in natural populations

    Characterization of Urinary Microbiome and Their Association with Health and Disease

    Get PDF
    There has been a growing interest in human microbiome studies in the past decade, with the development of high-throughput sequencing techniques. These microorganisms interact and respond to the host as an entity, and are involved in various homeostatic functions including nutrition digestion, immune response, metabolism and endocrine regulation. The urinary microbiome, however, remains relatively under-investigated. One of the technical challenges of urinary microbiome studies is the samples usually contain a large number of host cells and low microbial biomass. These samples with the high host, low microbial abundance (“high-low” samples) are associated with increased risk of compromised quality of 16s rRNA gene sequencing results. An analysis with mock samples showed that mechanisms of host materials interfering with microbiome analysis includes reducing microbial DNA extract yield by competitively binding to the filter of DNA extraction column, inhibiting PCR amplification of 16S rRNA gene regions as non-target DNA, and consuming sequencing depth by unspecific amplification from PCR. To counter these issues, a refined processing protocol and a quality checking tool were developed for handling “high-low” samples. With these methods, a combination of sequencing-based methods and enhanced culture-based methods showed evidence of bacteria in renal tissue samples. On the other hand, the optimal urine sample collection and storage methods for microbiome study have not been reported. An optimisation experiment showed that urine samples with a volume higher than 20 mL and stored in centrifuged pellets generated the best sequencing results. The urinary microbiome of healthy subjects and urinary stone patients were characterised using 16s rRNA gene sequencing and enhanced quantitative urine culture (EQUC) techniques. Although no clear distinction was observed of urinary microbiome profiles between healthy subjects and urinary stone patients, male and female individuals do have their unique urinary microbiome profiles. The urinary microbiome profile of an individual remained stable throughout three months. Investigation of urine samples of metabolic stone patients before and after lithotripsy showed fluctuations in their urinary microbiome profiles, with newly-emerged microbes in sequencing results correlated with microbes cultured from stone samples. These results suggested bacteria liberated from metabolic stones during lithotripsy

    Simulations and Modelling for Biological Invasions

    Get PDF
    Biological invasions are characterized by the movement of organisms from their native geographic region to new, distinct regions in which they may have significant impacts. Biological invasions pose one of the most serious threats to global biodiversity, and hence significant resources are invested in predicting, preventing, and managing them. Biological systems and processes are typically large, complex, and inherently difficult to study naturally because of their immense scale and complexity. Hence, computational modelling and simulation approaches can be taken to study them. In this dissertation, I applied computer simulations to address two important problems in invasion biology. First, in invasion biology, the impact of genetic diversity of introduced populations on their establishment success is unknown. We took an individual-based modelling approach to explore this, leveraging an ecosystem simulation called EcoSim to simulate biological invasions. We conducted reciprocal transplants of prey individuals across two simulated environments, over a gradient of genetic diversity. Our simulation results demonstrated that a harsh environment with low and spatially-varying resource abundance mediated a relationship between genetic diversity and short-term establishment success of introduced populations rather than the degree of difference between native and introduced ranges. We also found that reducing Allee effects by maintaining compactness, a measure of spatial density, was key to the establishment success of prey individuals in EcoSim, which were sexually reproducing. Further, we found evidence of a more complex relationship between genetic diversity and long-term establishment success, assuming multiple introductions were occurring. Low-diversity populations seemed to benefit more strongly from multiple introductions than high-diversity populations. Our results also corroborated the evolutionary imbalance hypothesis: the environment that yielded greater diversity produced better invaders and itself was less invasible. Finally, our study corroborated a mechanical explanation for the evolutionary imbalance hypothesis – the populations evolved in a more intense competitive environment produced better invaders. Secondly, an important advancement in invasion biology is the use of genetic barcoding or metabarcoding, in conjunction with next-generation sequencing, as a potential means of early detection of aquatic introduced species. Barcoding and metabarcoding invariably requires some amount of computational DNA sequence processing. Unfortunately, optimal processing parameters are not known in advance and the consequences of suboptimal parameter selection are poorly understood. We aimed to determine the optimal parameterization of a common sequence processing pipeline for both early detection of aquatic nonindigenous species and conducting species richness assessments. We then aimed to determine the performance of optimized pipelines in a simulated inoculation of sequences into community samples. We found that early detection requires relatively lenient processing parameters. Further, optimality depended on the research goal – what was optimal for early detection was suboptimal for estimating species richness and vice-versa. Finally, with optimal parameter selection, fewer than 11 target sequences were required in order to detect 90% of nonindigenous species

    Microbial community functioning at hypoxic sediments revealed by targeted metagenomics and RNA stable isotope probing

    Get PDF
    Microorganisms are instrumental to the structure and functioning of marine ecosystems and to the chemistry of the ocean due to their essential part in the cycling of the elements and in the recycling of the organic matter. Two of the most critical ocean biogeochemical cycles are those of nitrogen and sulfur, since they can influence the synthesis of nucleic acids and proteins, primary productivity and microbial community structure. Oxygen concentration in marine environments is one of the environmental variables that have been largely affected by anthropogenic activities; its decline induces hypoxic events which affect benthic organisms and fisheries. Hypoxia has been traditionally defined based on the level of oxygen below which most animal life cannot be sustained. Hypoxic conditions impact microbial composition and activity since anaerobic reactions and pathways are favoured, at the expense of the aerobic ones. Naturally occurring hypoxia can be found in areas where water circulation is restricted, such as coastal lagoons, and in areas where oxygen-depleted water is driven into the continental shelf, i.e. coastal upwelling regions. Coastal lagoons are highly dynamic aquatic systems, particularly vulnerable to human activities and susceptible to changes induced by natural events. For the purpose of this PhD project, the lagoonal complex of Amvrakikos Gulf, one of the largest semi-enclosed gulfs in the Mediterranean Sea, was chosen as a study site. Coastal upwelling regions are another type of environment limited in oxygen, where also formation of oxygen minimum zones (OMZs) has been reported. Sediment in upwelling regions is rich in organic matter and bottom water is often depleted of oxygen because of intense heterotrophic respiration. For the purpose of this PhD project, the chosen coastal upwelling system was the Benguela system off Namibia, situated along the coast of south western Africa. The aim of this PhD project was to study the microbial community assemblages of hypoxic ecosystems and to identify a potential link between their identity and function, with a particular emphasis on the microorganisms involved in the nitrogen and sulfur cycles. The methodology that was applied included targeted metagenomics and RNA stable isotope probing (SIP). It has been shown that the microbial community diversity pattern can be differentiated based on habitat type, i.e. between riverine, lagoonal and marine environments. Moreover, the studied habitats were functionally distinctive. Apart from salinity, which was the abiotic variable best correlated with the microbial community pattern, oxygen concentration was highly correlated with the predicted metabolic pattern of the microbial communities. In addition, when the total number of Operational Taxonomic Units (OTUs) was taken into consideration, a negative linear relationship with salinity was identified (see Chapter 2). Microbial community diversity patterns can also be differentiated based on the lagoon under study since each lagoon hosts a different sulfate-reducing microbial (SRM) community, again highly correlated with salinity. Moreover, the majority of environmental terms that characterized the SRM communities were classified to the marine biome, but terms belonging to the freshwater or brackish biomes were also found in stations were a freshwater effect was more evident (see Chapter 3). Taxonomic groups that were expected to be thriving in the sediments of the Benguela coastal upwelling system were absent or present but in very low abundances. Epsilonproteobacteria dominated the anaerobic assimilation of acetate as confirmed by their isotopic enrichment in the SIP experiments. Enhancement of known sulfate-reducers was not achieved under sulfate addition, possibly due to competition for electron donors among nitrate-reducers and sulfate-reducers, to the inability of certain sulfate-reducing bacteria to use acetate as electron donor or to the short duration of the incubations (see Chapter 4). Future research should focus more on the community functioning of such habitats; an increased understanding of the biogeochemical cycles that characterize these hypoxic ecosystems will perhaps allow for predictions regarding the intensity and direction of the cycling of elements, especially of nitrogen and sulfur given their biological importance. Regulation of hypoxic episodes will aid the end-users of these ecosystems to possibly achieve higher productivity, in terms of fish catches, which otherwise is largely compromised by the elevated hydrogen sulfide concentrations

    Microbial community functioning at hypoxic sediments revealed by targeted metagenomics and RNA stable isotope probing

    Get PDF
    Microorganisms are instrumental to the structure and functioning of marine ecosystems and to the chemistry of the ocean due to their essential part in the cycling of the elements and in the recycling of the organic matter. Two of the most critical ocean biogeochemical cycles are those of nitrogen and sulfur, since they can influence the synthesis of nucleic acids and proteins, primary productivity and microbial community structure. Oxygen concentration in marine environments is one of the environmental variables that have been largely affected by anthropogenic activities; its decline induces hypoxic events which affect benthic organisms and fisheries. Hypoxia has been traditionally defined based on the level of oxygen below which most animal life cannot be sustained. Hypoxic conditions impact microbial composition and activity since anaerobic reactions and pathways are favoured, at the expense of the aerobic ones. Naturally occurring hypoxia can be found in areas where water circulation is restricted, such as coastal lagoons, and in areas where oxygen-depleted water is driven into the continental shelf, i.e. coastal upwelling regions. Coastal lagoons are highly dynamic aquatic systems, particularly vulnerable to human activities and susceptible to changes induced by natural events. For the purpose of this PhD project, the lagoonal complex of Amvrakikos Gulf, one of the largest semi-enclosed gulfs in the Mediterranean Sea, was chosen as a study site. Coastal upwelling regions are another type of environment limited in oxygen, where also formation of oxygen minimum zones (OMZs) has been reported. Sediment in upwelling regions is rich in organic matter and bottom water is often depleted of oxygen because of intense heterotrophic respiration. For the purpose of this PhD project, the chosen coastal upwelling system was the Benguela system off Namibia, situated along the coast of south western Africa. The aim of this PhD project was to study the microbial community assemblages of hypoxic ecosystems and to identify a potential link between their identity and function, with a particular emphasis on the microorganisms involved in the nitrogen and sulfur cycles. The methodology that was applied included targeted metagenomics and RNA stable isotope probing (SIP). It has been shown that the microbial community diversity pattern can be differentiated based on habitat type, i.e. between riverine, lagoonal and marine environments. Moreover, the studied habitats were functionally distinctive. Apart from salinity, which was the abiotic variable best correlated with the microbial community pattern, oxygen concentration was highly correlated with the predicted metabolic pattern of the microbial communities. In addition, when the total number of Operational Taxonomic Units (OTUs) was taken into consideration, a negative linear relationship with salinity was identified (see Chapter 2). Microbial community diversity patterns can also be differentiated based on the lagoon under study since each lagoon hosts a different sulfate-reducing microbial (SRM) community, again highly correlated with salinity. Moreover, the majority of environmental terms that characterized the SRM communities were classified to the marine biome, but terms belonging to the freshwater or brackish biomes were also found in stations were a freshwater effect was more evident (see Chapter 3). Taxonomic groups that were expected to be thriving in the sediments of the Benguela coastal upwelling system were absent or present but in very low abundances. Epsilonproteobacteria dominated the anaerobic assimilation of acetate as confirmed by their isotopic enrichment in the SIP experiments. Enhancement of known sulfate-reducers was not achieved under sulfate addition, possibly due to competition for electron donors among nitrate-reducers and sulfate-reducers, to the inability of certain sulfate-reducing bacteria to use acetate as electron donor or to the short duration of the incubations (see Chapter 4). Future research should focus more on the community functioning of such habitats; an increased understanding of the biogeochemical cycles that characterize these hypoxic ecosystems will perhaps allow for predictions regarding the intensity and direction of the cycling of elements, especially of nitrogen and sulfur given their biological importance. Regulation of hypoxic episodes will aid the end-users of these ecosystems to possibly achieve higher productivity, in terms of fish catches, which otherwise is largely compromised by the elevated hydrogen sulfide concentrations
    corecore