23 research outputs found

    Genome-wide computational prediction of tandem gene arrays: application in yeasts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper describes an efficient <it>in silico </it>method for detecting tandem gene arrays (TGAs) in fully sequenced and compact genomes such as those of prokaryotes or unicellular eukaryotes. The originality of this method lies in the search of protein sequence similarities in the vicinity of each coding sequence, which allows the prediction of tandem duplicated gene copies independently of their functionality.</p> <p>Results</p> <p>Applied to nine hemiascomycete yeast genomes, this method predicts that 2% of the genes are involved in TGAs and gene relics are present in 11% of TGAs. The frequency of TGAs with degenerated gene copies means that a significant fraction of tandem duplicated genes follows the birth-and-death model of evolution. A comparison of sequence identity distributions between sets of homologous gene pairs shows that the different copies of tandem arrayed paralogs are less divergent than copies of dispersed paralogs in yeast genomes. It suggests that paralogs included in tandem structures are more recent or more subject to the gene conversion mechanism than other paralogs.</p> <p>Conclusion</p> <p>The method reported here is a useful computational tool to provide a database of TGAs composed of functional or nonfunctional gene copies. Such a database has obvious applications in the fields of structural and comparative genomics. Notably, a detailed study of the TGA catalog will make it possible to tackle the fundamental questions of the origin and evolution of tandem gene clusters.</p

    How Many Messenger RNAs Can Be Translated by the START Mechanism?

    Get PDF
    Translation initiation is a key step in the protein synthesis stage of the gene expression pathway of all living cells. In this important process, ribosomes have to accurately find the AUG start codon in order to ensure the integrity of the proteome. &ldquo;Structure Assisted RNA Translation&rdquo;, or &ldquo;START&rdquo;, has been proposed to use stable secondary structures located in the coding sequence to augment start site selection by steric hindrance of the progression of pre-initiation complex on messenger RNA. This implies that such structures have to be located downstream and at on optimal distance from the AUG start codon (i.e., downstream nucleotide +16). In order to assess the importance of the START mechanism in the overall mRNA translation process, we developed a bioinformatic tool to screen coding sequences for such stable structures in a 50 nucleotide-long window spanning the nucleotides from +16 to +65. We screened eight bacterial genomes and six eukaryotic genomes. We found stable structures in 0.6&ndash;2.5% of eukaryotic coding sequences. Among these, approximately half of them were structures predicted to form G-quadruplex structures. In humans, we selected 747 structures. In bacteria, the coding sequences from Gram-positive bacteria contained 2.6&ndash;4.2% stable structures, whereas the structures were less abundant in Gram-negative bacteria (0.2&ndash;2.7%). In contrast to eukaryotes, putative G-quadruplex structures are very rare in the coding sequence of bacteria. Altogether, our study reveals that the START mechanism seems to be an ancient strategy to facilitate the start codon recognition that is used in different kingdoms of life

    Expansion and contraction of the DUP240 multigene family in Saccharomyces cerevisiae populations.

    No full text
    The influence of duplicated sequences on chromosomal stability is poorly understood. To characterize chromosomal rearrangements involving duplicated sequences, we compared the organization of tandem repeats of the DUP240 gene family in 15 Saccharomyces cerevisiae strains of various origins. The DUP240 gene family consists of 10 members of unknown function in the reference strain S288C. Five DUP240 paralogs on chromosome I and two on chromosome VII are arranged as tandem repeats that are highly polymorphic in copy number and sequence. We characterized DNA sequences that are likely involved in homologous or nonhomologous recombination events and are responsible for intra- and interchromosomal rearrangements that cause the creation and disappearance of DUP240 paralogs. The tandemly repeated DUP240 genes seem to be privileged sites of gene birth and death

    Paleogenomics or the search for remnant duplicated copies of the yeast DUP240 gene family in intergenic areas.

    No full text
    Duplication, resulting in gene redundancy, is well known to be a driving force of evolutionary change. Gene families are therefore useful targets for approaching genome evolution. To address the gene death process, we examined the fate of the 10-member-large S288C DUP240 family in 15 Saccharomyces cerevisiae strains. Using an original three-step method of analysis reported here, both slightly and highly degenerate DUP240 copies, called pseudo-open reading frames (ORFs) and relics, respectively, were detected in strain S288C. It was concluded that two previously annotated ORFs correspond, in fact, to pseudo-ORFs and three additional relics were identified in intergenic areas. Comparative intraspecies analysis of these degenerate DUP240 loci revealed that the two pseudo-ORFs are present in a nondegenerate state in some other strains. This suggests that within a given gene family different loci are the target of the gene erasure process, which is therefore strain dependent. Besides, the variable positions observed indicate that the relic sequence may diverge faster than the flanking regions. All in all, this study shows that short conserved protein motifs provide a useful tool for detecting and accurately mapping degenerate gene remnants. The present results also highlight the strong contribution of comparative genomics for gene relic detection because the possibility of finding short conserved protein motifs in intergenic regions (IRs) largely depends on the choice of the most closely related paralog or ortholog. By mapping new genetic components in previously annotated IRs, our study constitutes a further refinement step in the crucial stage of genome annotation and provides a strategy for retracing ancient chromosomal reshaping events and, hence, for deciphering genome history.historical articlejournal articleresearch support, non-u.s. gov't2005 Sep2005 05 25importe
    corecore