72 research outputs found
Retrieving sequences of enzymes experimentally characterized but erroneously annotated : the case of the putrescine carbamoyltransferase
BACKGROUND: Annotating genomes remains an hazardous task. Mistakes or gaps in such a complex process may occur when relevant knowledge is ignored, whether lost, forgotten or overlooked. This paper exemplifies an approach which could help to ressucitate such meaningful data. RESULTS: We show that a set of closely related sequences which have been annotated as ornithine carbamoyltransferases are actually putrescine carbamoyltransferases. This demonstration is based on the following points : (i) use of enzymatic data which had been overlooked, (ii) rediscovery of a short NH(2)-terminal sequence allowing to reannotate a wrongly annotated ornithine carbamoyltransferase as a putrescine carbamoyltransferase, (iii) identification of conserved motifs allowing to distinguish unambiguously between the two kinds of carbamoyltransferases, and (iv) comparative study of the gene context of these different sequences. CONCLUSIONS: We explain why this specific case of misannotation had not yet been described and draw attention to the fact that analogous instances must be rather frequent. We urge to be especially cautious when high sequence similarity is coupled with an apparent lack of biochemical information. Moreover, from the point of view of genome annotation, proteins which have been studied experimentally but are not correlated with sequence data in current databases qualify as "orphans", just as unassigned genomic open reading frames do. The strategy we used in this paper to bridge such gaps in knowledge could work whenever it is possible to collect a body of facts about experimental data, homology, unnoticed sequence data, and accurate informations about gene context
GenoQuery: a new querying module for functional annotation in a genomic warehouse
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data
Origination of the Split Structure of Spliceosomal Genes from Random Genetic Sequences
The mechanism by which protein-coding portions of eukaryotic genes came to be separated by long non-coding stretches of DNA, and the purpose for this perplexing arrangement, have remained unresolved fundamental biological problems for three decades. We report here a plausible solution to this problem based on analysis of open reading frame (ORF) length constraints in the genomes of nine diverse species. If primordial nucleic acid sequences were random in sequence, functional proteins that are innately long would not be encoded due to the frequent occurrence of stop codons. The best possible way that a long protein-coding sequence could have been derived was by evolving a split-structure from the random DNA (or RNA) sequence. Results of the systematic analyses of nine complete genome sequences presented here suggests that perhaps the major underlying structural features of split-genes have evolved due to the indigenous occurrence of split protein-coding genes in primordial random nucleotide sequence. The results also suggest that intron-rich genes containing short exons may have been the original form of genes intrinsically occurring in random DNA, and that intron-poor genes containing long exons were perhaps derived from the original intron-rich genes
New Insight into the Transcarbamylase Family: The Structure of Putrescine Transcarbamylase, a Key Catalyst for Fermentative Utilization of Agmatine
Transcarbamylases reversibly transfer a carbamyl group from carbamylphosphate (CP) to an amine. Although aspartate transcarbamylase and ornithine transcarbamylase (OTC) are well characterized, little was known about putrescine transcarbamylase (PTC), the enzyme that generates CP for ATP production in the fermentative catabolism of agmatine. We demonstrate that PTC (from Enterococcus faecalis), in addition to using putrescine, can utilize L-ornithine as a poor substrate. Crystal structures at 2.5 Å and 2.0 Å resolutions of PTC bound to its respective bisubstrate analog inhibitors for putrescine and ornithine use, N-(phosphonoacetyl)-putrescine and δ-N-(phosphonoacetyl)-L-ornithine, shed light on PTC preference for putrescine. Except for a highly prominent C-terminal helix that projects away and embraces an adjacent subunit, PTC closely resembles OTCs, suggesting recent divergence of the two enzymes. Since differences between the respective 230 and SMG loops of PTC and OTC appeared to account for the differential preference of these enzymes for putrescine and ornithine, we engineered the 230-loop of PTC to make it to resemble the SMG loop of OTCs, increasing the activity with ornithine and greatly decreasing the activity with putrescine. We also examined the role of the C-terminal helix that appears a constant and exclusive PTC trait. The enzyme lacking this helix remained active but the PTC trimer stability appeared decreased, since some of the enzyme eluted as monomers from a gel filtration column. In addition, truncated PTC tended to aggregate to hexamers, as shown both chromatographically and by X-ray crystallography. Therefore, the extra C-terminal helix plays a dual role: it stabilizes the PTC trimer and, by shielding helix 1 of an adjacent subunit, it prevents the supratrimeric oligomerizations of obscure significance observed with some OTCs. Guided by the structural data we identify signature traits that permit easy and unambiguous annotation of PTC sequences
Gene fusions and gene duplications: relevance to genomic annotation and functional analysis
BACKGROUND: Escherichia coli a model organism provides information for annotation of other genomes. Our analysis of its genome has shown that proteins encoded by fused genes need special attention. Such composite (multimodular) proteins consist of two or more components (modules) encoding distinct functions. Multimodular proteins have been found to complicate both annotation and generation of sequence similar groups. Previous work overstated the number of multimodular proteins in E. coli. This work corrects the identification of modules by including sequence information from proteins in 50 sequenced microbial genomes. RESULTS: Multimodular E. coli K-12 proteins were identified from sequence similarities between their component modules and non-fused proteins in 50 genomes and from the literature. We found 109 multimodular proteins in E. coli containing either two or three modules. Most modules had standalone sequence relatives in other genomes. The separated modules together with all the single (un-fused) proteins constitute the sum of all unimodular proteins of E. coli. Pairwise sequence relationships among all E. coli unimodular proteins generated 490 sequence similar, paralogous groups. Groups ranged in size from 92 to 2 members and had varying degrees of relatedness among their members. Some E. coli enzyme groups were compared to homologs in other bacterial genomes. CONCLUSION: The deleterious effects of multimodular proteins on annotation and on the formation of groups of paralogs are emphasized. To improve annotation results, all multimodular proteins in an organism should be detected and when known each function should be connected with its location in the sequence of the protein. When transferring functions by sequence similarity, alignment locations must be noted, particularly when alignments cover only part of the sequences, in order to enable transfer of the correct function. Separating multimodular proteins into module units makes it possible to generate protein groups related by both sequence and function, avoiding mixing of unrelated sequences. Organisms differ in sizes of groups of sequence-related proteins. A sample comparison of orthologs to selected E. coli paralogous groups correlates with known physiological and taxonomic relationships between the organisms
- …