86 research outputs found

    Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

    Get PDF
    BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

    Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present <it>rank-BLAST</it>, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database.</p> <p>Results</p> <p>The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples.</p> <p>Conclusion</p> <p>Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.</p

    Approaches to vaccination against Theileria parva and Theileria annulata

    Get PDF
    Despite having different cell tropism, the pathogenesis and immunobiology of the diseases caused by Theileria parva and Theileria annulata are remarkably similar. Live vaccines have been available for both parasites for over 40 years, but although they provide strong protection, practical disadvantages have limited their widespread application. Efforts to develop alternative vaccines using defined parasite antigens have focused on the sporozoite and intracellular schizont stages of the parasites. Experimental vaccination studies using viral vectors expressing T. parva schizont antigens and T. parva and T. annulata sporozoite antigens incorporated in adjuvant have, in each case, demonstrated protection against parasite challenge in a proportion of vaccinated animals. Current work is investigating alternative antigen delivery systems in an attempt to improve the levels of protection. The genome architecture and protein-coding capacity of T. parva and T. annulata are remarkably similar. The major sporozoite surface antigen in both species and most of the schizont antigens are encoded by orthologous genes. The former have been shown to induce species cross-reactive neutralizing antibodies, and comparison of the schizont antigen orthologues has demonstrated that some of them display high levels of sequence conservation. Hence, advances in development of subunit vaccines against one parasite species are likely to be readily applicable to the other

    Modeling the accretion history of supermassive black holes

    Full text link
    There is overwhelming evidence for the presence of supermassive black holes (SMBHs) in the centers of most nearby galaxies. The mass estimates for these remnant black holes from the stellar kinematics of local galaxies and the quasar phenomenon at high redshifts point to the presence of assembled SMBHs. The accretion history of SMBHs can be reconstructed using observations at high and low redshifts as model constraints. Observations of galaxies and quasars in the submillimeter, infrared, optical, and X-ray wavebands are used as constraints, along with data from the demography of local black holes. Theoretical modeling of the growth of black hole mass with cosmic time has been pursued thus far in two distinct directions: a phenomenological approach that utilizes observations in various wavebands, and a semi-analytic approach that starts with a theoretical framework and a set of assumptions with a view to matching observations. Both techniques have been pursued in the context of the standard paradigm for structure formation in a Cold Dark Matter dominated universe. Here, we examine the key issues and uncertainties in the theoretical understanding of the growth of SMBHs.Comment: 19 pages, 4 figures, to appear as Chapter 4 in "Supermassive Black Holes in the Distant Universe" (2004), ed. A. J. Barger, Kluwer Academic Publishers, in pres

    Defining seasonal marine microbial community dynamics

    Get PDF
    Here we describe, the longest microbial time-series analyzed to date using high-resolution 16S rRNA tag pyrosequencing of samples taken monthly over 6 years at a temperate marine coastal site off Plymouth, UK. Data treatment effected the estimation of community richness over a 6-year period, whereby 8794 operational taxonomic units (OTUs) were identified using single-linkage preclustering and 21 130 OTUs were identified by denoising the data. The Alphaproteobacteria were the most abundant Class, and the most frequently recorded OTUs were members of the Rickettsiales (SAR 11) and Rhodobacteriales. This near-surface ocean bacterial community showed strong repeatable seasonal patterns, which were defined by winter peaks in diversity across all years. Environmental variables explained far more variation in seasonally predictable bacteria than did data on protists or metazoan biomass. Change in day length alone explains >65% of the variance in community diversity. The results suggested that seasonal changes in environmental variables are more important than trophic interactions. Interestingly, microbial association network analysis showed that correlations in abundance were stronger within bacterial taxa rather than between bacteria and eukaryotes, or between bacteria and environmental variables

    Compton Thick AGN: the dark side of the X-ray background

    Full text link
    The spectrum of the hard X-ray background records the history of accretion processes integrated over the cosmic time. Several pieces of observational and theoretical evidence indicate that a significant fraction of the energy density is obscured by large columns of gas and dust. The absorbing matter is often very thick, with column densities exceeding N_H > 1.5 10^24 cm-2, the value corresponding to unity optical depth for Compton scattering. These sources are called ``Compton thick'' and appear to be very numerous, at least in the nearby universe. Although Compton thick Active Galactic Nuclei (AGN) are thought to provide an important contribution to the overall cosmic energy budget, their space density and cosmological evolution are poorly known. The properties of Compton thick AGN are reviewed here, with particular emphasis on their contributions to the extragalactic background light in the hard X-ray and infrared bands.Comment: 28 pages, 10 figures. Review for "Supermassive Black Holes in the Distant Universe", Ed. A. J. Barger, Kluwer Academi
    • …
    corecore