61 research outputs found

    Evolution of proteomes: fundamental signatures and global trends in amino acid compositions

    Get PDF
    BACKGROUND: The evolutionary characterization of species and lifestyles at global levels is nowadays a subject of considerable interest, particularly with the availability of many complete genomes. Are there specific properties associated with lifestyles and phylogenies? What are the underlying evolutionary trends? One of the simplest analyses to address such questions concerns characterization of proteomes at the amino acids composition level. RESULTS: In this work, amino acid compositions of a large set of 208 proteomes, with significant number of representatives from the three phylogenetic domains and different lifestyles are analyzed, resorting to an appropriate multidimensional method: Correspondence analysis. The analysis reveals striking discrimination between eukaryotes, prokaryotic mesophiles and hyperthemophiles-themophiles, following amino acid usage. In sharp contrast, no similar discrimination is observed for psychrophiles. The observed distributional properties are compared with various inferred chronologies for the recruitment of amino acids into the genetic code. Such comparisons reveal correlations between the observed segregations of species following amino acid usage, and the separation of amino acids following early or late recruitment. CONCLUSION: A simple description of proteomes according to amino acid compositions reveals striking signatures, with sharp segregations or on the contrary non-discriminations following phylogenies and lifestyles. The distribution of species, following amino acid usage, exhibits a discrimination between [high GC]-[high optimal growth temperatures] and [low GC]-[moderate temperatures] characteristics. This discrimination appears to coincide closely with the separation of amino acids following their inferred early or late recruitment into the genetic code. Taken together the various results provide a consistent picture for the evolution of proteomes, in terms of amino acid usage

    A novel design of whole-genome microarray probes for Saccharomyces cerevisiae which minimizes cross-hybridization

    Get PDF
    BACKGROUND: Numerous DNA microarray hybridization experiments have been performed in yeast over the last years using either synthetic oligonucleotides or PCR-amplified coding sequences as probes. The design and quality of the microarray probes are of critical importance for hybridization experiments as well as subsequent analysis of the data. RESULTS: We present here a novel design of Saccharomyces cerevisiae microarrays based on a refined annotation of the genome and with the aim of reducing cross-hybridization between related sequences. An effort was made to design probes of similar lengths, preferably located in the 3'-end of reading frames. The sequence of each gene was compared against the entire yeast genome and optimal sub-segments giving no predicted cross-hybridization were selected. A total of 5660 novel probes (more than 97% of the yeast genes) were designed. For the remaining 143 genes, cross-hybridization was unavoidable. Using a set of 18 deletant strains, we have experimentally validated our cross-hybridization procedure. Sensitivity, reproducibility and dynamic range of these new microarrays have been measured. Based on this experience, we have written a novel program to design long oligonucleotides for microarray hybridizations of complete genome sequences. CONCLUSIONS: A validated procedure to predict cross-hybridization in microarray probe design was defined in this work. Subsequently, a novel Saccharomyces cerevisiae microarray (which minimizes cross-hybridization) was designed and constructed. Arrays are available at Eurogentec S. A. Finally, we propose a novel design program, OliD, which allows automatic oligonucleotide design for microarrays. The OliD program is available from authors

    Genome Trees from Conservation Profiles

    Get PDF
    The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component ā€œpresence-absenceā€ protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings

    Expressed sequence tag analysis of the human pathogen Paracoccidioides brasiliensis yeast phase: Identification of putative homologues of Candida albicans virulence and pathogenicity genes

    Get PDF
    Paracoccidioides brasiliensis, a thermodimorphic fungus, is the causative agent of the prevalent systemic mycosis in Latin America, paracoccidioidomycosis. We present here a survey of expressed genes in the yeast pathogenic phase of P. brasiliensis. We obtained 13,490 expressed sequence tags from both 5' and 3' ends. Clustering analysis yielded the partial sequences of 4,692 expressed genes that were functionally classified by similarity to known genes. We have identified several Candida albicans virulence and pathogenicity homologues in P. brasiliensis. Furthermore, we have analyzed the expression of some of these genes during the dimorphic yeast-mycelium-yeast transition by real-time quantitative reverse transcription-PCR. Clustering analysis of the mycelium-yeast transition revealed three groups: (i) RBT, hydrophobin, and isocitrate lyase; (ii) malate dehydrogenase, contigs Pb1067 and Pb1145, GPI, and alternative oxidase; and (iii) ubiquitin, delta-9-desaturase, HSP70, HSP82, and HSP104. the first two groups displayed high mRNA expression in the mycelial phase, whereas the third group showed higher mRNA expression in the yeast phase. Our results suggest the possible conservation of pathogenicity and virulence mechanisms among fungi, expand considerably gene identification in P. brasiliensis, and provide a broader basis for further progress in understanding its biological peculiarities.Univ SĆ£o Paulo, Dept Ciencias Farmaceut, Fac Ciencias Farmaceut Ribeirao Preto, BR-14040903 Ribeirao Preto, SP, BrazilUniv SĆ£o Paulo, Fac Filosofia Ciencias & Letras Ribeirao Pret, BR-14040903 Ribeirao Preto, SP, BrazilInst Pasteur, Unite Genet Mol Levures, Paris, FranceUniv Vale do Paraiba, UNIVAP, Vale Do Paraiba, BrazilUniv Mogi das Cruzes, Nucleo Integrado Biotecnol, Mogi Das Cruzes, BrazilUniversidade Federal de SĆ£o Paulo, Dept Microbiol Imunol & Parasitol, SĆ£o Paulo, BrazilUniversidade Federal de SĆ£o Paulo, Dept Microbiol Imunol & Parasitol, SĆ£o Paulo, BrazilWeb of Scienc

    Genomic Exploration of the Hemiascomycetous Yeasts: 1. A set of yeast species for molecular evolution studies11Sequences and annotations are accessible at: GĆ©noscope (http://www.genoscope.cns.fr), FEBS Letters Website (http://www.elsevier.nl/febs/show/), Bordeaux (http://cbi.genopole-bordeaux.fr/Genolevures) and were deposited into the EMBL database (accession number from AL392203 to AL441602).

    Get PDF
    AbstractThe identification of molecular evolutionary mechanisms in eukaryotes is approached by a comparative genomics study of a homogeneous group of species classified as Hemiascomycetes. This group includes Saccharomyces cerevisiae, the first eukaryotic genome entirely sequenced, back in 1996. A random sequencing analysis has been performed on 13 different species sharing a small genome size and a low frequency of introns. Detailed information is provided in the 20 following papers. Additional tables available on websites describe the ca. 20ā€ˆ000 newly identified genes. This wealth of data, so far unique among eukaryotes, allowed us to examine the conservation of chromosome maps, to identify the ā€˜yeast-specificā€™ genes, and to review the distribution of gene families into functional classes. This project conducted by a network of seven French laboratories has been designated ā€˜GĆ©nolevuresā€™

    Protection against Mycobacterium ulcerans Lesion Development by Exposure to Aquatic Insect Saliva

    Get PDF
    BACKGROUND: Buruli ulcer is a severe human skin disease caused by Mycobacterium ulcerans. This disease is primarily diagnosed in West Africa with increasing incidence. Antimycobacterial drug therapy is relatively effective during the preulcerative stage of the disease, but surgical excision of lesions with skin grafting is often the ultimate treatment. The mode of transmission of this Mycobacterium species remains a matter of debate, and relevant interventions to prevent this disease lack (i) the proper understanding of the M. ulcerans life history traits in its natural aquatic ecosystem and (ii) immune signatures that could be correlates of protection. We previously set up a laboratory ecosystem with predatory aquatic insects of the family Naucoridae and laboratory mice and showed that (i) M. ulcerans-carrying aquatic insects can transmit the mycobacterium through bites and (ii) that their salivary glands are the only tissues hosting replicative M. ulcerans. Further investigation in natural settings revealed that 5%ā€“10% of these aquatic insects captured in endemic areas have M. ulceransā€“loaded salivary glands. In search of novel epidemiological features we noticed that individuals working close to aquatic environments inhabited by insect predators were less prone to developing Buruli ulcers than their relatives. Thus we set out to investigate whether those individuals might display any immune signatures of exposure to M. ulcerans-free insect predator bites, and whether those could correlate with protection. METHODS AND FINDINGS: We took a two-pronged approach in this study, first investigating whether the insect bites are protective in a mouse model, and subsequently looking for possibly protective immune signatures in humans. We found that, in contrast to control BALB/c mice, BALB/c mice exposed to Naucoris aquatic insect bites or sensitized to Naucoris salivary gland homogenates (SGHs) displayed no lesion at the site of inoculation of M. ulcerans coated with Naucoris SGH components. Then using human serum samples collected in a Buruli ulcerā€“endemic area (in the Republic of Benin, West Africa), we assayed sera collected from either ulcer-free individuals or patients with Buruli ulcers for the titre of IgGs that bind to insect predator SGH, focusing on those molecules otherwise shown to be retained by M. ulcerans colonies. IgG titres were lower in the Buruli ulcer patient group than in the ulcer-free group. CONCLUSIONS: These data will help structure future investigations in Buruli ulcerā€“endemic areas, providing a rationale for research into human immune signatures of exposure to predatory aquatic insects, with special attention to those insect saliva molecules that bind to M. ulcerans

    Genomic Exploration of the Hemiascomycetous Yeasts: 19. Ascomycetes-specific genes

    Get PDF
    AbstractComparisons of the 6213 predicted Saccharomyces cerevisiae open reading frame (ORF) products with sequences from organisms of other biological phyla differentiate genes commonly conserved in evolution from ā€˜maverickā€™ genes which have no homologue in phyla other than the Ascomycetes. We show that a majority of the ā€˜maverickā€™ genes have homologues among other yeast species and thus define a set of 1892 genes that, from sequence comparisons, appear ā€˜Ascomycetes-specificā€™. We estimate, retrospectively, that the S. cerevisiae genome contains 5651 actual protein-coding genes, 50 of which were identified for the first time in this work, and that the present public databases contain 612 predicted ORFs that are not real genes. Interestingly, the sequences of the ā€˜Ascomycetes-specificā€™ genes tend to diverge more rapidly in evolution than that of other genes. Half of the ā€˜Ascomycetes-specificā€™ genes are functionally characterized in S. cerevisiae, and a few functional categories are over-represented in them

    A Human-Curated Annotation of the Candida albicans Genome

    Get PDF
    Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications

    Inferring Orthologs: Open Questions and Perspectives

    No full text
    International audienceWith the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny

    Genome Data Exploration Using Correspondence Analysis

    Get PDF
    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables
    • ā€¦
    corecore