Search CORE

FUNGIpath: a tool to assess fungal metabolic pathways predicted by orthology

Author: Grossetête Sandrine
Labedan Bernard
Lespinet Olivier
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background More and more completely sequenced fungal genomes are becoming available and many more sequencing projects are in progress. This deluge of data should improve our knowledge of the various primary and secondary metabolisms of Fungi, including their synthesis of useful compounds such as antibiotics or toxic molecules such as mycotoxins. Functional annotation of many fungal genomes is imperfect, especially of genes encoding enzymes, so we need dedicated tools to analyze their metabolic pathways in depth. Description FUNGIpath is a new tool built using a two-stage approach. Groups of orthologous proteins predicted using complementary methods of detection were collected in a relational database. Each group was further mapped on to steps in the metabolic pathways published in the public databases KEGG and MetaCyc. As a result, FUNGIpath allows the primary and secondary metabolisms of the different fungal species represented in the database to be compared easily, making it possible to assess the level of specificity of various pathways at different taxonomic distances. It is freely accessible at <url>http://www.fungipath.u-psud.fr</url>. Conclusions As more and more fungal genomes are expected to be sequenced during the coming years, FUNGIpath should help progressively to reconstruct the ancestral primary and secondary metabolisms of the main branches of the fungal tree of life and to elucidate the evolution of these ancestral fungal metabolisms to various specific derived metabolisms.</p

Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data

Author: Labedan Bernard
Lemoine Frédéric
Lespinet Olivier
Publication venue: BioMed Central
Publication date: 01/11/2007
Field of study

Abstract Background Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving. Results We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing <it>bona fide </it>orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms. Conclusion The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.</p

Bioinformatic analysis of an unusual gene-enzyme relationship in the arginine biosynthetic pathway among marine gamma proteobacteria: implications concerning the formation of N-acetylated intermediates in prokaryotes

Author: Glansdorff Nicolas
Labedan Bernard
Xu Ying
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The N-acetylation of L-glutamate is regarded as a universal metabolic strategy to commit glutamate towards arginine biosynthesis. Until recently, this reaction was thought to be catalyzed by either of two enzymes: (i) the classical N-acetylglutamate synthase (NAGS, gene argA) first characterized in Escherichia coli and Pseudomonas aeruginosa several decades ago and also present in vertebrates, or (ii) the bifunctional version of ornithine acetyltransferase (OAT, gene argJ) present in Bacteria, Archaea and many Eukaryotes. This paper focuses on a new and surprising aspect of glutamate acetylation. We recently showed that in Moritella abyssi and M. profunda, two marine gamma proteobacteria, the gene for the last enzyme in arginine biosynthesis (argH) is fused to a short sequence that corresponds to the C-terminal, N-acetyltransferase-encoding domain of NAGS and is able to complement an argA mutant of E. coli. Very recently, other authors identified in Mycobacterium tuberculosis an independent gene corresponding to this short C-terminal domain and coding for a new type of NAGS. We have investigated the two prokaryotic Domains for patterns of gene-enzyme relationships in the first committed step of arginine biosynthesis. RESULTS: The argH-A fusion, designated argH(A), and discovered in Moritella was found to be present in (and confined to) marine gamma proteobacteria of the Alteromonas- and Vibrio-like group. Most of them have a classical NAGS with the exception of Idiomarina loihiensis and Pseudoalteromonas haloplanktis which nevertheless can grow in the absence of arginine and therefore appear to rely on the arg(A) sequence for arginine biosynthesis. Screening prokaryotic genomes for virtual argH-X 'fusions' where X stands for a homologue of arg(A), we retrieved a large number of Bacteria and several Archaea, all of them devoid of a classical NAGS. In the case of Thermus thermophilus and Deinococcus radiodurans, the arg(A)-like sequence clusters with argH in an operon-like fashion. In this group of sequences, we find the short novel NAGS of the type identified in M. tuberculosis. Among these organisms, at least Thermus, Mycobacterium and Streptomyces species appear to rely on this short NAGS version for arginine biosynthesis. CONCLUSION: The gene-enzyme relationship for the first committed step of arginine biosynthesis should now be considered in a new perspective. In addition to bifunctional OAT, nature appears to implement at least three alternatives for the acetylation of glutamate. It is possible to propose evolutionary relationships between them starting from the same ancestral N-acetyltransferase domain. In M. tuberculosis and many other bacteria, this domain evolved as an independent enzyme, whereas it fused either with a carbamate kinase fold to give the classical NAGS (as in E. coli) or with argH as in marine gamma proteobacteria. Moreover, there is an urgent need to clarify the current nomenclature since the same gene name argA has been used to designate structurally different entities. Clarifying the confusion would help to prevent erroneous genomic annotation

Open Marine Archive

SynteBase/SynteView: a tool to visualize gene order conservation in prokaryotic genomes

Author: Labedan Bernard
Lemoine Frédéric
Lespinet Olivier
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances. Results After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute <it>on-the-fly </it>and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at <url>http://www.synteview.u-psud.fr</url>. Conclusion SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.</p

Matching curated genome databases: a non trivial task

Author: Barba Matthieu
Descorps-Declère Stéphane
Labedan Bernard
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. Results Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome. CorBank is freely accessible at <url>http://www.corbank.u-psud.fr</url>. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. Conclusion CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.</p

ORENZA: a web resource for studying ORphan ENZyme activities

Author: A Bairoch
A Bairoch
A Fleischmann
Bernard Labedan
DG Naumoff
DL Wheeler
HM Berman
I Schomburg
M Kanehisa
ML Green
N Hulo
O Lespinet
O Lespinet
Olivier Lespinet
PD Karp
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Despite the current availability of several hundreds of thousands of amino acid sequences, more than 36% of the enzyme activities (EC numbers) defined by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) are not associated with any amino acid sequence in major public databases. This wide gap separating knowledge of biochemical function and sequence information is found for nearly all classes of enzymes. Thus, there is an urgent need to explore these sequence-less EC numbers, in order to progressively close this gap. DESCRIPTION: We designed ORENZA, a PostgreSQL database of ORphan ENZyme Activities, to collate information about the EC numbers defined by the NC-IUBMB with specific emphasis on orphan enzyme activities. Complete lists of all EC numbers and of orphan EC numbers are available and will be periodically updated. ORENZA allows one to browse the complete list of EC numbers or the subset associated with orphan enzymes or to query a specific EC number, an enzyme name or a species name for those interested in particular organisms. It is possible to search ORENZA for the different biochemical properties of the defined enzymes, the metabolic pathways in which they participate, the taxonomic data of the organisms whose genomes encode them, and many other features. The association of an enzyme activity with an amino acid sequence is clearly underlined, making it easy to identify at once the orphan enzyme activities. Interactive publishing of suggestions by the community would provide expert evidence for re-annotation of orphan EC numbers in public databases. CONCLUSION: ORENZA is a Web resource designed to progressively bridge the unwanted gap between function (enzyme activities) and sequence (dataset present in public databases). ORENZA should increase interactions between communities of biochemists and of genomicists. This is expected to reduce the number of orphan enzyme activities by allocating gene sequences to the relevant enzymes