141 research outputs found

    TC-motifs at the TATA-box expected position in plant genes: a novel class of motifs involved in the transcription regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The TATA-box and TATA-variants are regulatory elements involved in the formation of a transcription initiation complex. Both have been conserved throughout evolution in a restricted region close to the Transcription Start Site (TSS). However, less than half of the genes in model organisms studied so far have been found to contain either one of these elements. Indeed different core-promoter elements are involved in the recruitment of the TATA-box-binding protein. Here we assessed the possibility of identifying novel functional motifs in plant genes, sharing the TATA-box topological constraints.</p> <p>Results</p> <p>We developed an <it>ab-initio </it>approach considering the preferential location of motifs relative to the TSS. We identified motifs observed at the TATA-box expected location and conserved in both <it>Arabidopsis thaliana </it>and <it>Oryza sativa </it>promoters. We identified TC-elements within non-TA-rich promoters 30 bases upstream of the TSS. As with the TATA-box and TATA-variant sequences, it was possible to construct a unique distance graph with the TC-element sequences. The structural and functional features of TC-element-containing genes were distinct from those of TATA-box- or TATA-variant-containing genes. <it>Arabidopsis thaliana </it>transcriptome analysis revealed that TATA-box-containing genes were generally those showing relatively high levels of expression and that TC-element-containing genes were generally those expressed in specific conditions.</p> <p>Conclusions</p> <p>Our observations suggest that the TC-elements might constitute a class of novel regulatory elements participating towards the complex modulation of gene expression in plants.</p

    Unique genes in plants: specificities and conserved features throughout evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Plant genomes contain a high proportion of duplicated genes as a result of numerous whole, segmental and local duplications. These duplications lead up to the formation of gene families, which are the usual material for many evolutionary studies. However, all characterized genomes include single-copy (unique) genes that have not received much attention. Unlike gene duplication, gene loss is not an unspecific mechanism but is rather influenced by a functional selection. In this context, we have established and used stringent criteria in order to identify suitable sets of unique genes present in plant proteomes. Comparisons of unique genes in the green phylum were used to characterize the gene and protein features exhibited by both conserved and species-specific unique genes.</p> <p>Results</p> <p>We identified the unique genes within both <it>A. thaliana </it>and <it>O. sativa </it>genomes and classified them according to the number of homologs in the alternative species: none (U{1:0}), one (U{1:1}) or several (U{1:m}). Regardless of the species, all the genes in these groups present some conserved characteristics, such as small average protein size and abnormal intron number. In order to understand the origin and function of unique genes, we further characterized the U{1:1} gene pairs. The possible involvement of sequence convergence in the creation of U{1:1} pairs was discarded due to the frequent conservation of intron positions. Furthermore, an orthology relationship between the two members of each U{1:1} pair was strongly supported by a high conservation in the protein sizes and transcription levels. Within the promoter of the unique conserved genes, we found a number of TATA and TELO boxes that specifically differed from their mean number in the whole genome. Many unique genes have been conserved as unique through evolution from the green alga <it>Ostreococcus lucimarinus </it>to higher plants. Plant unique genes may also have homologs in bacteria and we showed a link between the targeting towards plastids of proteins encoded by plant nuclear unique genes and their homology with a bacterial protein.</p> <p>Conclusion</p> <p>Many of the <it>A. thaliana </it>and <it>O. sativa </it>unique genes are conserved in plants for which the ancestor diverged at least 725 million years ago (MYA). Half of these genes are also present in other eukaryotic and/or prokaryotic species. Thus, our results indicate that (i) a strong negative selection pressure has conserved a number of genes as unique in genomes throughout evolution, (ii) most unique genes are subjected to a low divergence rate, (iii) they have some features observed in housekeeping genes but for most of them there is no functional annotation and (iv) they may have an ancient origin involving a possible gene transfer from ancestral chloroplasts or bacteria to the plant nucleus.</p

    Genes of the most conserved WOX clade in plants affect root and flower development in Arabidopsis

    Get PDF
    Background: The Wuschel related homeobox (WOX) family proteins are key regulators implicated in the determination of cell fate in plants by preventing cell differentiation. A recent WOX phylogeny, based on WOX homeodomains, showed that all of the Physcomitrella patens and Selaginella moellendorffii WOX proteins clustered into a single orthologous group. We hypothesized that members of this group might preferentially share a significant part of their function in phylogenetically distant organisms. Hence, we first validated the limits of the WOX13 orthologous group (WOX13 OG) using the occurrence of other clade specific signatures and conserved intron insertion sites. Secondly, a functional analysis using expression data and mutants was undertaken. Results: The WOX13 OG contained the most conserved plant WOX proteins including the only WOX detected in the highly proliferating basal unicellular and photosynthetic organism Ostreococcus tauri. A large expansion of the WOX family was observed after the separation of mosses from other land plants and before monocots and dicots have arisen. In Arabidopsis thaliana, AtWOX13 was dynamically expressed during primary and lateral root initiation and development, in gynoecium and during embryo development. AtWOX13 appeared to affect the floral transition. An intriguing clade, represented by the functional AtWOX14 gene inside the WOX13 OG, was only found in the Brassicaceae. Compared to AtWOX13, the gene expression profile of AtWOX14 was restricted to the early stages of lateral root formation and specific to developing anthers. A mutational insertion upstream of the AtWOX14 homeodomain sequence led to abnormal root development, a delay in the floral transition and premature anther differentiation. Conclusion: Our data provide evidence in favor of the WOX13 OG as the clade containing the most conserved WOX genes and established a functional link to organ initiation and development in Arabidopsis, most likely by preventing premature differentiation. The future use of Ostreococcus tauri and Physcomitrella patens as biological models should allow us to obtain a better insight into the functional importance of WOX13 OG genes

    Exploration of plant genomes in the FLAGdb++ environment

    Get PDF
    Background : In the contexts of genomics, post-genomics and systems biology approaches, data integration presents a major concern. Databases provide crucial solutions: they store, organize and allow information to be queried, they enhance the visibility of newly produced data by comparing them with previously published results, and facilitate the exploration and development of both existing hypotheses and new ideas. Results : The FLAGdb++ information system was developed with the aim of using whole plant genomes as physical references in order to gather and merge available genomic data from in silico or experimental approaches. Available through a JAVA application, original interfaces and tools assist the functional study of plant genes by considering them in their specific context: chromosome, gene family, orthology group, co-expression cluster and functional network. FLAGdb++ is mainly dedicated to the exploration of large gene groups in order to decipher functional connections, to highlight shared or specific structural or functional features, and to facilitate translational tasks between plant species (Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera). Conclusion : Combining original data with the output of experts and graphical displays that differ from classical plant genome browsers, FLAGdb++ presents a powerful complementary tool for exploring plant genomes and exploiting structural and functional resources, without the need for computer programming knowledge. First launched in 2002, a 15th version of FLAGdb++ is now available and comprises four model plant genomes and over eight million genomic features

    Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the finishing of the sequencing of the <it>Arabidopsis thaliana </it>genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models.</p> <p>Results</p> <p>The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS.</p> <p>Conclusion</p> <p>This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs.</p

    Genome-Wide Survey and Expression Analysis Suggest Diverse Roles of Glutaredoxin Gene Family Members During Development and Response to Various Stimuli in Rice

    Get PDF
    Glutaredoxins (GRXs) are glutathione-dependent oxidoreductase enzymes involved in a variety of cellular processes. In this study, our analysis revealed the presence of 48 genes encoding GRX proteins in the rice genome. GRX proteins could be classified into four classes, namely CC-, CGFS-, CPYC- and GRL-type, based on phylogenetic analysis. The classification was supported with organization of predicted conserved putative motifs in GRX proteins. We found that expansion of this gene family has occurred largely via whole genome duplication events in a species-specific manner. We explored rice oligonucleotide array data to gain insights into the function of GRX gene family members during various stages of development and in response to environmental stimuli. The comprehensive expression analysis suggested diverse roles of GRX genes during growth and development in rice. Some of the GRX genes were expressed in specific organs/developmental stages only. The expression of many of rice GRX genes was influenced by various phytohormones, abiotic and biotic stress conditions, suggesting an important role of GRX proteins in response to these stimuli. The identification of GRX genes showing differential expression in specific tissues or in response to environmental stimuli provide a new avenue for in-depth characterization of selected genes of importance

    Sélection de variables pour la classification par mélanges gaussiens pour prédire la fonction des gènes orphelins

    Get PDF
    Biologists are interested in predicting the gene functions of sequenced genome organisms according to microarray transcriptome data. The microarray technology development allows one to study the whole genome in different experimental conditions. The information abundance may seem to be an advantage for the gene clustering. However, the structure of interest can often be contained in a subset of the available variables. The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. A more versatile variable selection model is proposed, taking into account three possible roles for each variable: The relevant clustering variables, the redundant variables and the independent variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modelling. The interest of this new modelling for discovering the function of orphan genes is highlighted on a transcriptome dataset for the arabidopsis thaliana plant.Les biologistes s’attachent actuellement à prédire la fonction des gènes d’organismes de génome séquence à partir de données transcriptomes, issues de l’utilisation des puces à ADN. Le d´développement de cette technologie permet de tester l’expression de l’ensemble du génome dans de nombreuses conditions expérimentales. Cette quantité d’information peut alors sembler être un atout pour la classification des gènes. Pourtant il est courant que seul un sous-ensemble contienne l’information pertinente pour la classification. Les procédures de sélection des variables en classification non supervisée par mélanges gaussiens supposent généralement que les variables non informatives pour la classification sont soit toutes indépendantes, soit liées à des variables informatives. Nous proposons une nouvelle modélisation du rôle des variables plus polyvalente : les variables sont soit informatives pour la classification, soit redondantes, soit totalement indépendantes. Nous proposons un critère de sélection des variables et un algorithme pour cette nouvelle modélisation. L’intérêt de cette nouvelle modélisation pour la prédiction de la fonction des gènes orphelins est illustrée sur un ensemble de données transcriptomes obtenues chez Arabidopsis thaliana

    CATdb: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform

    Get PDF
    CATdb is a free resource available at http://urgv.evry.inra.fr/CATdb that provides public access to a large collection of transcriptome data for Arabidopsis thaliana produced by a single Complete Arabidopsis Transcriptome Micro Array (CATMA) platform. CATMA probes consist of gene-specific sequence tags (GSTs) of 150–500 bp. The v2 version of CATMA contains 24 576 GST probes representing most of the predicted A. thaliana genes, and 615 probes tiling the chloroplastic and mitochondrial genomes. Data in CATdb are entirely processed with the same standardized protocol, from microarray printing to data analyses. CATdb contains the results of 53 projects including 1724 hybridized samples distributed between 13 different organs, 49 different developmental conditions, 45 mutants and 63 environmental conditions. All the data contained in CATdb can be downloaded from the web site and subsets of data can be sorted out and displayed either by keywords, by experiments, genes or lists of genes up to 100. CATdb gives an easy access to the complete description of experiments with a picture of the experiment design
    corecore