119 research outputs found

    Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.</p> <p>Results</p> <p>We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net).</p> <p>Conclusion</p> <p>The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

    WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

    Get PDF
    Gerlach W, Jünemann S, Tille F, Goesmann A, Stoye J. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads. BMC Bioinformatics. 2009;10(1):430.Background Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. Results In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. Conclusions We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.d

    Blueprint for a minimal photoautotrophic cell: conserved and variable genes in Synechococcus elongatus PCC 7942

    Get PDF
    Background: Simpler biological systems should be easier to understand and to engineer towards pre-defined goals. One way to achieve biological simplicity is through genome minimization. Here we looked for genomic islands in the fresh water cyanobacteria Synechococcus elongatus PCC 7942 (genome size 2.7 Mb) that could be used as targets for deletion. We also looked for conserved genes that might be essential for cell survival.Results: By using a combination of methods we identified 170 xenologs, 136 ORFans and 1401 core genes in the genome of S. elongatus PCC 7942. These represent 6.5%, 5.2% and 53.6% of the annotated genes respectively. We considered that genes in genomic islands could be found if they showed a combination of: a) unusual G+C content; b) unusual phylogenetic similarity; and/or c) a small number of the highly iterated palindrome 1 (HIP1) motif plus an unusual codon usage. The origin of the largest genomic island by horizontal gene transfer (HGT) could be corroborated by lack of coverage among metagenomic sequences from a fresh water microbialite. Evidence is also presented that xenologous genes tend to cluster in operons. Interestingly, most genes coding for proteins with a diguanylate cyclase domain are predicted to be xenologs, suggesting a role for horizontal gene transfer in the evolution of Synechococcus sensory systems.Conclusions: Our estimates of genomic islands in PCC 7942 are larger than those predicted by other published methods like SIGI-HMM. Our results set a guide to non-essential genes in S. elongatus PCC 7942 indicating a path towards the engineering of a model photoautotrophic bacterial cell.Financial support was provided by grants BFU2009-12895-C02-01/BMC (Ministerio de Ciencia e Innovación, Spain), the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 212894 and Prometeo/2009/092 (Conselleria d’Educació, Generalitat Valenciana, Spain) to A. Moya. Work in the FdlC laboratory was supported by grants BFU2008-00995/BMC (Spanish Ministry of Education), RD06/0008/1012 (RETICS research network, Instituto de Salud Carlos III, Spanish Ministry of Health) and LSHM-CT- 2005_019023 (European VI Framework Program). Dr. González-Domenech was supported by grant from the University of Granada. LD, thanks to financial support from Facultad de Ciencias, Universidad Nacional Autónoma de México

    Phylogenomic Analysis of Odyssella thessalonicensis Fortifies the Common Origin of Rickettsiales, Pelagibacter ubique and Reclimonas americana Mitochondrion

    Get PDF
    Background: The evolution of the Alphaproteobacteria and origin of the mitochondria are topics of considerable debate. Most studies have placed the mitochondria ancestor within the Rickettsiales order. Ten years ago, the bacterium Odyssella thessalonicensis was isolated from Acanthamoeba spp., and the 16S rDNA phylogeny placed it within the Rickettsiales. Recently, the whole genome of O. thessalonicensis has been sequenced, and 16S rDNA phylogeny and more robust and accurate phylogenomic analyses have been performed with 65 highly conserved proteins. Methodology/Principal Findings: The results suggested that the O. thessalonicensis emerged between the Rickettsiales and other Alphaproteobacteria. The mitochondrial proteins of the Reclinomonas americana have been used to locate the phylogenetic position of the mitochondrion ancestor within the Alphaproteobacteria tree. Using the K tree score method, nine mitochondrion-encoded proteins, whose phylogenies were congruent with the Alphaproteobacteria phylogenomic tree, have been selected and concatenated for Bayesian and Maximum Likelihood phylogenies. The Reclinomonas americana mitochondrion is a sister taxon to the free-living bacteria Candidatus Pelagibacter ubique, and together, they form a clade that is deeply rooted in the Rickettsiales clade. Conclusions/Significance: The Reclinomonas americana mitochondrion phylogenomic study confirmed that mitochondri

    Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Get PDF
    Abstract Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns.This work was supported by Research Grant Award No. IS-4223-09C from BARD, the United States-Israel Binational Agricultural Research and Development Fund, and by SNC Laboratoire ASL, de Ruiter Seeds B.V., Enza Zaden B.V., Gautier Semences S.A., Nunhems B.V., Rijk Zwaan B.V., Sakata Seed Inc, Semillas Fitó S.A., Seminis Vegetable Seeds Inc, Syngenta Seeds B.V., Takii and Company Ltd, Vilmorin and Cie S.A. and Zeraim Gedera Ltd (all of them as part of the support to ICuGI). CC was supported by CNRS ERL 8196.Peer Reviewe

    A Phylometagenomic Exploration of Oceanic Alphaproteobacteria Reveals Mitochondrial Relatives Unrelated to the SAR11 Clade

    Get PDF
    BACKGROUND: According to the endosymbiont hypothesis, the mitochondrial system for aerobic respiration was derived from an ancestral Alphaproteobacterium. Phylogenetic studies indicate that the mitochondrial ancestor is most closely related to the Rickettsiales. Recently, it was suggested that Candidatus Pelagibacter ubique, a member of the SAR11 clade that is highly abundant in the oceans, is a sister taxon to the mitochondrial-Rickettsiales clade. The availability of ocean metagenome data substantially increases the sampling of Alphaproteobacteria inhabiting the oxygen-containing waters of the oceans that likely resemble the originating environment of mitochondria. METHODOLOGY/PRINCIPAL FINDINGS: We present a phylogenetic study of the origin of mitochondria that incorporates metagenome data from the Global Ocean Sampling (GOS) expedition. We identify mitochondrially related sequences in the GOS dataset that represent a rare group of Alphaproteobacteria, designated OMAC (Oceanic Mitochondria Affiliated Clade) as the closest free-living relatives to mitochondria in the oceans. In addition, our analyses reject the hypothesis that the mitochondrial system for aerobic respiration is affiliated with that of the SAR11 clade. CONCLUSIONS/SIGNIFICANCE: Our results allude to the existence of an alphaproteobacterial clade in the oxygen-rich surface waters of the oceans that represents the closest free-living relative to mitochondria identified thus far. In addition, our findings underscore the importance of expanding the taxonomic diversity in phylogenetic analyses beyond that represented by cultivated bacteria to study the origin of mitochondria

    Localized Plasticity in the Streamlined Genomes of Vinyl Chloride Respiring Dehalococcoides

    Get PDF
    Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain the majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism

    Expression of auxin-binding protein1 during plum fruit ontogeny supports the potential role of auxin in initiating and enhancing climacteric ripening

    Get PDF
    Auxin-binding protein1 (ABP1) is an active element involved in auxin signaling and plays critical roles in auxin-mediated plant development. Here, we report the isolation and characterization of a putative sequence from Prunus salicina L., designated PslABP1. The expected protein exhibits a similar molecular structure to that of well-characterized maize-ABP1; however, PslABP1 displays more sequence polarity in the active-binding site due to substitution of some crucial amino-acid residues predicted to be involved in auxin-binding. Further, PslABP1 expression was assessed throughout fruit ontogeny to determine its role in fruit development. Comparing the expression data with the physiological aspects that characterize fruit-development stages indicates that PslABP1 up-regulation is usually associated with the signature events that are triggered in an auxin-dependent manner such as floral induction, fruit initiation, embryogenesis, and cell division and elongation. However, the diversity in PslABP1 expression profile during the ripening process of early and late plum cultivars seems to be due to the variability of endogenous auxin levels among the two cultivars, which consequently can change the levels of autocatalytic ethylene available for the fruit to co-ordinate ripening. The effect of auxin on stimulating ethylene production and in regulating PslABP1 was investigated. Our data suggest that auxin is involved in the transition of the mature green fruit into the ripening phase and in enhancing the ripening process in both auxin- and ethylene-dependent manners thereafter
    corecore