192 research outputs found

    How the other half lives: CRISPR-Cas's influence on bacteriophages

    Full text link
    CRISPR-Cas is a genetic adaptive immune system unique to prokaryotic cells used to combat phage and plasmid threats. The host cell adapts by incorporating DNA sequences from invading phages or plasmids into its CRISPR locus as spacers. These spacers are expressed as mobile surveillance RNAs that direct CRISPR-associated (Cas) proteins to protect against subsequent attack by the same phages or plasmids. The threat from mobile genetic elements inevitably shapes the CRISPR loci of archaea and bacteria, and simultaneously the CRISPR-Cas immune system drives evolution of these invaders. Here we highlight our recent work, as well as that of others, that seeks to understand phage mechanisms of CRISPR-Cas evasion and conditions for population coexistence of phages with CRISPR-protected prokaryotes.Comment: 24 pages, 8 figure

    A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.</p> <p>Results</p> <p>Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.</p> <p>Conclusion</p> <p>The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.</p

    Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.</p> <p>Results</p> <p>The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (<b>RAMMCAP</b>) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".</p> <p>Conclusion</p> <p>RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

    WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

    Get PDF
    Gerlach W, Jünemann S, Tille F, Goesmann A, Stoye J. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads. BMC Bioinformatics. 2009;10(1):430.Background Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. Results In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. Conclusions We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.d

    CLOTU: An online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The implementation of high throughput sequencing for exploring biodiversity poses high demands on bioinformatics applications for automated data processing. Here we introduce <smcaps>CLOTU</smcaps>, an online and open access pipeline for processing 454 amplicon reads. C<smcaps>LOTU</smcaps> has been constructed to be highly user-friendly and flexible, since different types of analyses are needed for different datasets.</p> <p>Results</p> <p>In <smcaps>CLOTU</smcaps>, the user can filter out low quality sequences, trim tags, primers, adaptors, perform clustering of sequence reads, and run <smcaps>BLAST</smcaps> against NCBInr or a customized database in a high performance computing environment. The resulting data may be browsed in a user-friendly manner and easily forwarded to downstream analyses. Although <smcaps>CLOTU</smcaps> is specifically designed for analyzing 454 amplicon reads, other types of DNA sequence data can also be processed. A fungal ITS sequence dataset generated by 454 sequencing of environmental samples is used to demonstrate the utility of <smcaps>CLOTU</smcaps>.</p> <p>Conclusions</p> <p>C<smcaps>LOTU</smcaps> is a flexible and easy to use bioinformatics pipeline that includes different options for filtering, trimming, clustering and taxonomic annotation of high throughput sequence reads. Some of these options are not included in comparable pipelines. C<smcaps>LOTU</smcaps> is implemented in a Linux computer cluster and is freely accessible to academic users through the Bioportal web-based bioinformatics service (<url>http://www.bioportal.uio.no</url>).</p

    Staphylococcal Toxic Shock Syndrome 2000–2006: Epidemiology, Clinical Features, and Molecular Characteristics

    Get PDF
    Circulating strains of Staphylococcus aureus (SA) have changed in the last 30 years including the emergence of community-associated methicillin-resistant SA (MRSA). A report suggested staphylococcal toxic shock syndrome (TSS) was increasing over 2000-2003. The last population-based assessment of TSS was 1986.Population-based active surveillance for TSS meeting the CDC definition using ICD-9 codes was conducted in the Minneapolis-St. Paul area (population 2,642,056) from 2000-2006. Medical records of potential cases were reviewed for case criteria, antimicrobial susceptibility, risk factors, and outcome. Superantigen PCR testing and PFGE were performed on available isolates from probable and confirmed cases.Of 7,491 hospitalizations that received one of the ICD-9 study codes, 61 TSS cases (33 menstrual, 28 non-menstrual) were identified. The average annual incidence per 100,000 of all, menstrual, and non-menstrual TSS was 0.52 (95% CI, 0.32-0.77), 0.69 (0.39-1.16), and 0.32 (0.12-0.67), respectively. Women 13-24 years had the highest incidence at 1.41 (0.63-2.61). No increase in incidence was observed from 2000-2006. MRSA was isolated in 1 menstrual and 3 non-menstrual cases (7% of TSS cases); 1 isolate was USA400. The superantigen gene tst-1 was identified in 20 (80%) of isolates and was more common in menstrual compared to non-menstrual isolates (89% vs. 50%, p = 0.07). Superantigen genes sea, seb and sec were found more frequently among non-menstrual compared to menstrual isolates [100% vs 25% (p = 0.4), 60% vs 0% (p<0.01), and 25% vs 13% (p = 0.5), respectively].TSS incidence remained stable across our surveillance period of 2000-2006 and compared to past population-based estimates in the 1980s. MRSA accounted for a small percentage of TSS cases. tst-1 continues to be the superantigen associated with the majority of menstrual cases. The CDC case definition identifies the most severe cases and has been consistently used but likely results in a substantial underestimation of the total TSS disease burden

    Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

    Get PDF
    BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

    A review of elliptical and disc galaxy structure, and modern scaling laws

    Full text link
    A century ago, in 1911 and 1913, Plummer and then Reynolds introduced their models to describe the radial distribution of stars in `nebulae'. This article reviews the progress since then, providing both an historical perspective and a contemporary review of the stellar structure of bulges, discs and elliptical galaxies. The quantification of galaxy nuclei, such as central mass deficits and excess nuclear light, plus the structure of dark matter halos and cD galaxy envelopes, are discussed. Issues pertaining to spiral galaxies including dust, bulge-to-disc ratios, bulgeless galaxies, bars and the identification of pseudobulges are also reviewed. An array of modern scaling relations involving sizes, luminosities, surface brightnesses and stellar concentrations are presented, many of which are shown to be curved. These 'redshift zero' relations not only quantify the behavior and nature of galaxies in the Universe today, but are the modern benchmark for evolutionary studies of galaxies, whether based on observations, N-body-simulations or semi-analytical modelling. For example, it is shown that some of the recently discovered compact elliptical galaxies at 1.5 < z < 2.5 may be the bulges of modern disc galaxies.Comment: Condensed version (due to Contract) of an invited review article to appear in "Planets, Stars and Stellar Systems"(www.springer.com/astronomy/book/978-90-481-8818-5). 500+ references incl. many somewhat forgotten, pioneer papers. Original submission to Springer: 07-June-201

    Confining Domains Lead to Reaction Bursts: Reaction Kinetics in the Plasma Membrane

    Get PDF
    Confinement of molecules in specific small volumes and areas within a cell is likely to be a general strategy that is developed during evolution for regulating the interactions and functions of biomolecules. The cellular plasma membrane, which is the outermost membrane that surrounds the entire cell, was considered to be a continuous two-dimensional liquid, but it is becoming clear that it consists of numerous nano-meso-scale domains with various lifetimes, such as raft domains and cytoskeleton-induced compartments, and membrane molecules are dynamically trapped in these domains. In this article, we give a theoretical account on the effects of molecular confinement on reversible bimolecular reactions in a partitioned surface such as the plasma membrane. By performing simulations based on a lattice-based model of diffusion and reaction, we found that in the presence of membrane partitioning, bimolecular reactions that occur in each compartment proceed in bursts during which the reaction rate is sharply and briefly increased even though the asymptotic reaction rate remains the same. We characterized the time between reaction bursts and the burst amplitude as a function of the model parameters, and discussed the biological significance of the reaction bursts in the presence of strong inhibitor activity
    corecore