899 research outputs found

    Genomic Inference of the Metabolism and Evolution of the Archaeal Phylum Aigarchaeota

    Get PDF
    Microbes of the phylum Aigarchaeota are widely distributed in geothermal environments, but their physiological and ecological roles are poorly understood. Here we analyze six Aigarchaeota metagenomic bins from two circumneutral hot springs in Tengchong, China, to reveal that they are either strict or facultative anaerobes, and most are chemolithotrophs that can perform sulfide oxidation. Applying comparative genomics to the Thaumarchaeota and Aigarchaeota, we find that they both originated from thermal habitats, sharing 1154 genes with their common ancestor. Horizontal gene transfer played a crucial role in shaping genetic diversity of Aigarchaeota and led to functional partitioning and ecological divergence among sympatric microbes, as several key functional innovations were endowed by Bacteria, including dissimilatory sulfite reduction and possibly carbon monoxide oxidation. Our study expands our knowledge of the possible ecological roles of the Aigarchaeota and clarifies their evolutionary relationship to their sister lineage Thaumarchaeota

    ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes

    Get PDF
    Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at http://mech.ctb.pku.edu.cn/protis

    Going Deeper: Metagenome of a Hadopelagic Microbial Community

    Get PDF
    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional attributes present in microbes residing in a deeper layer of the ocean far removed from the more productive sun-drenched zones above

    QuartetS: a fast and accurate algorithm for large-scale orthology detection

    Get PDF
    The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required

    Massively parallel single-cell genomics of microbiomes in rice paddies

    Get PDF
    世界初のイネ根圏微生物叢の網羅的1細胞ゲノム解析に成功 --コメ生産現場が抱える問題のデータベース化に向けて--. 京都大学プレスリリース. 2022-11-09.Plant growth-promoting microbes (PGPMs) have attracted increasing attention because they may be useful in increasing crop yield in a low-input and sustainable manner to ensure food security. Previous studies have attempted to understand the principles underlying the rhizosphere ecology and interactions between plants and PGPMs using ribosomal RNA sequencing, metagenomic sequencing, and genome-resolved metagenomics; however, these approaches do not provide comprehensive genomic information for individual species and do not facilitate detailed analyses of plant–microbe interactions. In the present study, we developed a pipeline to analyze the genomic diversity of the rice rhizosphere microbiome at single-cell resolution. We isolated microbial cells from paddy soil and determined their genomic sequences by using massively parallel whole-genome amplification in microfluidic-generated gel capsules. We successfully obtained 3, 237 single-amplified genomes in a single experiment, and these genomic sequences provided insights into microbial functions in the paddy ecosystem. Our approach offers a promising platform for gaining novel insights into the roles of microbes in the rice rhizomicrobiome and to develop microbial technologies for improved and sustainable rice production

    INVESTIGATION OF SOME POSSIBLE ORIGINS OF PROTEIN FAMILIES

    Get PDF
    ABSTRACT Title of Document: INVESTIGATION OF SOME POSSIBLE ORIGINS OF PROTEIN FAMILIES Nuttinee Teerakulkittipong, Ph.D., 2013 Directed By: Professor John Moult, Institute for Bioscience and Biotechnology Research Department of Cell Biology and Molecular Genetics The prevailing view of the evolutionary history of proteins has been that all protein domains are descendents of distinct evolutionary lines, and that these lines are all relatively ancient families. The primary basis for that view was that known protein structures could be grouped by similarity of topology into a small number of folds. However, two lines of evidence challenge that view of protein evolution. First, analysis of sequence relationships within and between sets of complete genomes has established that a large proportion of protein sequence families are narrowly distributed in phylogenetic space and so appear to be relatively recent in origin. Second, analysis of the relationship between known protein structures shows that there are many more than a 1000 distinct folds, appearing to imply many more evolutionary lines. There are four hypotheses for the discrepancy between the traditional view and the observed structural and sequence distributions within protein families. Specifically, these are that apparently young protein families may arise from (1) previously non-coding DNA, or frame-shifted from existing coding sequence, (2) recombination of structural fragments between proteins or recombination with non-coding DNA, (3) older families where the rapid rate of sequence change makes relatives hard to detect, and (4) lateral gene transfer (LGT) from other organisms. In the investigation of these hypotheses, phylogenetic analysis provides a means of estimating the relative age of protein families and of detecting lateral gene transfer effects. Phylogeny based investigation of prokaryotic species divergence has generally been performed using a small number of families resulting in significant bias that affects age analysis. Therefore, we decided to use information from many protein families for constructing a species tree, utilizing a new procedure for combining these diverse sources. The resulting tree for 66 Prokaryotic species incorporates information from 1,379 protein families. The families were selected on the basis of consistent family evolutionary rates obtained using three different methods. Noise resistant methods were used to combat the effects of lateral gene transfer and some inevitable errors in protein sequence alignment and identification of orthologous families. Most topological features of the tree are robust as assessed by bootstrap testing, and previous distortions of inter-kingdom distances and poor determination of short branch lengths have been corrected. The tree is used to obtain estimates of the age of all protein families, key to the investigation of all four hypotheses. Proteins affected by LGT events were detected using a previously developed method, and removed before the age calculation. We used the estimated family ages obtained from the phylogenetic analysis to examine five properties of proteins as a function of the age of the corresponding families. The goal here is to ascertain whether the age dependence of these properties supports hypotheses (1) and (2) for the origin of apparently young families - that is, these are truly new open reading frames. The five properties are the mRNA expression level, relative evolutionary rate, predicted percentage of structural disorder, number of protein interaction partners and codon composition bias. The results are consistent with the new open reading frame model: Expression is found to increase substantially as a function of family age, suggesting that young proteins are not yet adapted sufficiently to tolerate high concentration conditions. The rate of change of amino acid change is faster for young proteins, consistent with overall positive selection for improved structural and functional properties. The fraction of predicted disorder is highest in the youngest proteins, consistent with immature structural properties. The number of known protein-protein interactions increases steadily with age, with low levels for young proteins, suggesting an ongoing process of increasing functional complexity. Analysis of these four factors is reported in Chapter 3. Results for the final factor, codon compositional bias, are reported in Chapter 4. Here we found that the codon composition of young proteins is markedly different from that of old proteins and similar to that of proteins constructed with random codon assignment. Thus the results are consistent with a model of many young proteins having newly formed open reading frames, and that during the subsequent evolution process, the codon composition is gradually optimized to fit the specific genomic conditions of the organism concerned. Overall, results for all five properties lend statistical support to the new open reading frame hypotheses. Further investigation is needed however. In particular, examination of the structural properties of young proteins, such as super-secondary structure composition and the distribution of use of rare and common structural fragments, should be useful

    Biology of archaea from a novel family Cuniculiplasmataceae (Thermoplasmata) ubiquitous in hyperacidic environments

    Get PDF
    The order Thermoplasmatales (Euryarchaeota) is represented by the most acidophilic organisms known so far that are poorly amenable to cultivation. Earlier culture-independent studies in Iron Mountain (California) pointed at an abundant archaeal group, dubbed 'G-plasma'. We examined the genomes and physiology of two cultured representatives of a Family Cuniculiplasmataceae, recently isolated from acidic (pH 1-1.5) sites in Spain and UK that are 16S rRNA gene sequence-identical with 'G-plasma'. Organisms had largest genomes among Thermoplasmatales (1.87-1.94 Mbp), that shared 98.7-98.8% average nucleotide identities between themselves and 'G-plasma' and exhibited a high genome conservation even within their genomic islands, despite their remote geographical localisations. Facultatively anaerobic heterotrophs, they possess an ancestral form of A-type terminal oxygen reductase from a distinct parental clade. The lack of complete pathways for biosynthesis of histidine, valine, leucine, isoleucine, lysine and proline pre-determines the reliance on external sources of amino acids and hence the lifestyle of these organisms as scavengers of proteinaceous compounds from surrounding microbial community members. In contrast to earlier metagenomics-based assumptions, isolates were S-layer-deficient, non-motile, non-methylotrophic and devoid of iron-oxidation despite the abundance of methylotrophy substrates and ferrous iron in situ, which underlines the essentiality of experimental validation of bioinformatic predictions

    CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources

    Get PDF
    International audienceBACKGROUND: The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. DESCRIPTION: The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays. CONCLUSIONS: With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten
    corecore