415 research outputs found

    Evolution of proteomes: fundamental signatures and global trends in amino acid compositions

    Get PDF
    BACKGROUND: The evolutionary characterization of species and lifestyles at global levels is nowadays a subject of considerable interest, particularly with the availability of many complete genomes. Are there specific properties associated with lifestyles and phylogenies? What are the underlying evolutionary trends? One of the simplest analyses to address such questions concerns characterization of proteomes at the amino acids composition level. RESULTS: In this work, amino acid compositions of a large set of 208 proteomes, with significant number of representatives from the three phylogenetic domains and different lifestyles are analyzed, resorting to an appropriate multidimensional method: Correspondence analysis. The analysis reveals striking discrimination between eukaryotes, prokaryotic mesophiles and hyperthemophiles-themophiles, following amino acid usage. In sharp contrast, no similar discrimination is observed for psychrophiles. The observed distributional properties are compared with various inferred chronologies for the recruitment of amino acids into the genetic code. Such comparisons reveal correlations between the observed segregations of species following amino acid usage, and the separation of amino acids following early or late recruitment. CONCLUSION: A simple description of proteomes according to amino acid compositions reveals striking signatures, with sharp segregations or on the contrary non-discriminations following phylogenies and lifestyles. The distribution of species, following amino acid usage, exhibits a discrimination between [high GC]-[high optimal growth temperatures] and [low GC]-[moderate temperatures] characteristics. This discrimination appears to coincide closely with the separation of amino acids following their inferred early or late recruitment into the genetic code. Taken together the various results provide a consistent picture for the evolution of proteomes, in terms of amino acid usage

    A novel design of whole-genome microarray probes for Saccharomyces cerevisiae which minimizes cross-hybridization

    Get PDF
    BACKGROUND: Numerous DNA microarray hybridization experiments have been performed in yeast over the last years using either synthetic oligonucleotides or PCR-amplified coding sequences as probes. The design and quality of the microarray probes are of critical importance for hybridization experiments as well as subsequent analysis of the data. RESULTS: We present here a novel design of Saccharomyces cerevisiae microarrays based on a refined annotation of the genome and with the aim of reducing cross-hybridization between related sequences. An effort was made to design probes of similar lengths, preferably located in the 3'-end of reading frames. The sequence of each gene was compared against the entire yeast genome and optimal sub-segments giving no predicted cross-hybridization were selected. A total of 5660 novel probes (more than 97% of the yeast genes) were designed. For the remaining 143 genes, cross-hybridization was unavoidable. Using a set of 18 deletant strains, we have experimentally validated our cross-hybridization procedure. Sensitivity, reproducibility and dynamic range of these new microarrays have been measured. Based on this experience, we have written a novel program to design long oligonucleotides for microarray hybridizations of complete genome sequences. CONCLUSIONS: A validated procedure to predict cross-hybridization in microarray probe design was defined in this work. Subsequently, a novel Saccharomyces cerevisiae microarray (which minimizes cross-hybridization) was designed and constructed. Arrays are available at Eurogentec S. A. Finally, we propose a novel design program, OliD, which allows automatic oligonucleotide design for microarrays. The OliD program is available from authors

    Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models

    Full text link
    In simple models side chains are often represented implicitly (e.g., by spin-states) or simplified as one atom. We study side chain effects using square lattice and tetrahedral lattice models, with explicitly side chains of two atoms. We distinguish effects due to chirality and effects due to side chain flexibilities, since residues in proteins are L-residues, and their side chains adopt different rotameric states. Short chains are enumerated exhaustively. For long chains, we sample effectively rare events (eg, compact conformations) and obtain complete pictures of ensemble properties of these models at all compactness region. We find that both chirality and reduced side chain flexibility lower the folding entropy significantly for globally compact conformations, suggesting that they are important properties of residues to ensure fast folding and stable native structure. This corresponds well with our finding that natural amino acid residues have reduced effective flexibility, as evidenced by analysis of rotamer libraries and side chain rotatable bonds. We further develop a method calculating the exact side-chain entropy for a given back bone structure. We show that simple rotamer counting often underestimates side chain entropy significantly, and side chain entropy does not always correlate well with main chain packing. Among compact backbones with maximum side chain entropy, helical structures emerges as the dominating configurations. Our results suggest that side chain entropy may be an important factor contributing to the formation of alpha helices for compact conformations.Comment: 16 pages, 15 figures, 2 tables. Accepted by J. Chem. Phy

    Transcript profiling in Candida albicans reveals new cellular functions for the transcriptional repressors CaTup1, CaMig1 and CaNrg1.

    Get PDF
    The pathogenic fungus, Candida albicans contains homologues of the transcriptional repressors ScTup1, ScMig1 and ScNrg1 found in budding yeast. In Saccharomyces cerevisiae, ScMig1 targets the ScTup1/ScSsn6 complex to the promoters of glucose repressed genes to repress their transcription. ScNrg1 is thought to act in a similar manner at other promoters. We have examined the roles of their homologues in C. albicans by transcript profiling with an array containing 2002 genes, representing about one quarter of the predicted number of open reading frames (ORFs) in C. albicans. The data revealed that CaNrg1 and CaTup1 regulate a different set of C. albicans genes from CaMig1 and CaTup1. This is consistent with the idea that CaMig1 and CaNrg1 target the CaTup1 repressor to specific subsets of C. albicans genes. However, CaMig1 and CaNrg1 repress other C. albicans genes in a CaTup1-independent fashion. The targets of CaMig1 and CaNrg1 repression, and phenotypic analyses of nrg1/nrg1 and mig1/mig1 mutants, indicate that these factors play differential roles in the regulation of metabolism, cellular morphogenesis and stress responses. Hence, the data provide important information both about the modes of action of these transcriptional regulators and their cellular roles. The transcript profiling data are available at http://www.pasteur.fr/recherche/unites/RIF/transcriptdata/

    A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

    Full text link
    We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms, hh is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

    Proteome sequence features carry signatures of the environmental niche of prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments.</p> <p>Results</p> <p>We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests.</p> <p>Conclusions</p> <p>To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.</p

    Genome Trees from Conservation Profiles

    Get PDF
    The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component “presence-absence” protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings

    Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment

    Get PDF
    The complete genomes of living organisms have provided much information on their phylogenetic relationships. Similarly, the complete genomes of chloroplasts have helped to resolve the evolution of this organelle in photosynthetic eukaryotes. In this paper we propose an alternative method of phylogenetic analysis using compositional statistics for all protein sequences from complete genomes. This new method is conceptually simpler than and computationally as fast as the one proposed by Qi et al. (2004b) and Chu et al. (2004). The same data sets used in Qi et al. (2004b) and Chu et al. (2004) are analyzed using the new method. Our distance-based phylogenic tree of the 109 prokaryotes and eukaryotes agrees with the biologists tree of life based on 16S rRNA comparison in a predominant majority of basic branching and most lower taxa. Our phylogenetic analysis also shows that the chloroplast genomes are separated to two major clades corresponding to chlorophytes s.l. and rhodophytes s.l. The interrelationships among the chloroplasts are largely in agreement with the current understanding on chloroplast evolution

    Tuberculous Granuloma Formation Is Enhanced by a Mycobacterium Virulence Determinant

    Get PDF
    Granulomas are organized host immune structures composed of tightly interposed macrophages and other cells that form in response to a variety of persistent stimuli, both infectious and noninfectious. The tuberculous granuloma is essential for host containment of mycobacterial infection, although it does not always eradicate it. Therefore, it is considered a host-beneficial, if incompletely efficacious, immune response. The Mycobacterium RD1 locus encodes a specialized secretion system that promotes mycobacterial virulence by an unknown mechanism. Using transparent zebrafish embryos to monitor the infection process in real time, we found that RD1-deficient bacteria fail to elicit efficient granuloma formation despite their ability to grow inside of infected macrophages. We showed that macrophages infected with virulent mycobacteria produce an RD1-dependent signal that directs macrophages to aggregate into granulomas. This Mycobacterium-induced macrophage aggregation in turn is tightly linked to intercellular bacterial dissemination and increased bacterial numbers. Thus, mycobacteria co-opt host granulomas for their virulence
    corecore