997 research outputs found

    Day-length is central to maintaining consistent seasonal diversity in marine bacterioplankton

    Get PDF
    Marine bacterial diversity is vast, but seasonal variation in diversity is poorly understood. Here we present the longest bacterial diversity time series consisting of monthly (72) samples from the western English Channel over a 6 year period (2003-2008) using 747,494 16SrDNA-V6 amplicon-pyrosequences. Although there were characteristic cycles for each phylum, the overall community cycle was remarkably stable year after year. The majority of taxa were not abundant, although on occasion these rare bacteria could dominate the assemblage. Bacterial diversity peaked at the winter solstice and showed remarkable synchronicity with day-length, which had the best explanatory power compared to a combination of other variables (including temperature and nutrient concentrations). Day-length has not previously been recognised as a major force in structuring microbial communities

    Bioinformatics and Data Management Support for Environmental Genomics

    Get PDF
    The UK Natural Environment Research Council has funded the creation of a dedicated bioinformatics centre as part of a £26m Environmental Genomics initiative

    Bias in culture-independent assessments of microbial biodiversity in the global ocean

    Get PDF
    On the basis of 16S rRNA gene sequencing, the SAR11 clade of marine bacteria has almost universal distribution, being detected as abundant sequences in all marine provinces. Yet SAR11 sequences are rarely detected in fosmid libraries, suggesting that the widespread abundance may be an artefact of PCR cloning and that SAR 11 has a relatively low abundance. Here the relative abundance of SAR11 is explored in both a fosmid library and a metagenomic sequence data set from the same biological community taken from fjord surface water from Bergen, Norway. Pyrosequenced data and 16S clone data confirmed an 11-15% relative abundance of SAR11 within the community. In contrast not a single SAR11 fosmid was identified in a pooled shotgun sequenced data set of 100 fosmid clones. This under-representation was evidenced by comparative abundances of SAR11 sequences assessed by taxonomic annotation; functional metabolic profiling and fragment recruitment. Analysis revealed a similar under-representation of low-GC Flavobacteriaceae. We speculate that the fosmid bias may be due to DNA fragmentation during preparation due to the low GC content of SAR11 sequences and other underrepresented taxa. This study suggests that while fosmid libraries can be extremely useful, caution must be used when directly inferring community composition from metagenomic fosmid libraries

    A Statistical Model of Protein Sequence Similarity and Function Similarity Reveals Overly-Specific Function Predictions

    Get PDF
    BACKGROUND:Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity. METHODOLOGY:Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity. SIGNIFICANCE:Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e(-62), non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e(-05), NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function

    Development of FuGO: An ontology for functional genomics investigations

    Get PDF
    The development of the Functional Genomics Investigation Ontology (FuGO) is a collaborative, international effort that will provide a resource for annotating functional genomics investigations, including the study design, protocols and instrumentation used, the data generated and the types of analysis performed on the data. FuGO will contain both terms that are universal to all functional genomics investigations and those that are domain specific. In this way, the ontology will serve as the “semantic glue” to provide a common understanding of data from across these disparate data sources. In addition, FuGO will reference out to existing mature ontologies to avoid the need to duplicate these resources, and will do so in such a way as to enable their ease of use in annotation. This project is in the early stages of development; the paper will describe efforts to initiate the project, the scope and organization of the project, the work accomplished to date, and the challenges encountered, as well as future plans

    16S rRNA assessment of the influence of shading on early-successional biofilms in experimental streams

    Get PDF
    Elevated nutrient levels can lead to excessive biofilm growth, but reducing nutrient pollution is often challenging. There is therefore interest in developing control measures for biofilm growth in nutrient-rich rivers that could act as complement to direct reductions in nutrient load. Shading of rivers is one option that can mitigate blooms, but few studies have experimentally examined the differences in biofilm communities grown under shaded and unshaded conditions. We investigated the assembly and diversity of biofilm communities using in situ mesocosms within the River Thames (UK). Biofilm composition was surveyed by 454 sequencing of 16S amplicons (∼400 bp length covering regions V6/V7). The results confirm the importance of sunlight for biofilm community assembly; a resource that was utilized by a relatively small number of dominant taxa, leading to significantly less diversity than in shaded communities. These differences between unshaded and shaded treatments were either because of differences in resource utilization or loss of diatom-structures as habitats for bacteria. We observed more co-occurrence patterns and network interactions in the shaded communities. This lends further support to the proposal that increased river shading can help mitigate the effects from macronutrient pollution in rivers

    How do we compare hundreds of bacterial genomes

    Get PDF
    The genomic revolution is fully upon us in 2006 and the pace of discovery is set to accelerate with the emergence of ultra-highthroughput sequencing technologies. Our complete genome collection of bacteria and archaea continues to grow in number and diversity, as genome sequencing is applied to an array of new problems, from the characterization of the pan-genome to the detection of mutation after experimentation and the exploration of microbial communities in unprecedented detail. The benefits of large-scale comparative genomic analyses are driving the community to think about how to manage our public collections of genomes in novel ways
    corecore