43 research outputs found

    TC-motifs at the TATA-box expected position in plant genes: a novel class of motifs involved in the transcription regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The TATA-box and TATA-variants are regulatory elements involved in the formation of a transcription initiation complex. Both have been conserved throughout evolution in a restricted region close to the Transcription Start Site (TSS). However, less than half of the genes in model organisms studied so far have been found to contain either one of these elements. Indeed different core-promoter elements are involved in the recruitment of the TATA-box-binding protein. Here we assessed the possibility of identifying novel functional motifs in plant genes, sharing the TATA-box topological constraints.</p> <p>Results</p> <p>We developed an <it>ab-initio </it>approach considering the preferential location of motifs relative to the TSS. We identified motifs observed at the TATA-box expected location and conserved in both <it>Arabidopsis thaliana </it>and <it>Oryza sativa </it>promoters. We identified TC-elements within non-TA-rich promoters 30 bases upstream of the TSS. As with the TATA-box and TATA-variant sequences, it was possible to construct a unique distance graph with the TC-element sequences. The structural and functional features of TC-element-containing genes were distinct from those of TATA-box- or TATA-variant-containing genes. <it>Arabidopsis thaliana </it>transcriptome analysis revealed that TATA-box-containing genes were generally those showing relatively high levels of expression and that TC-element-containing genes were generally those expressed in specific conditions.</p> <p>Conclusions</p> <p>Our observations suggest that the TC-elements might constitute a class of novel regulatory elements participating towards the complex modulation of gene expression in plants.</p

    Unique genes in plants: specificities and conserved features throughout evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Plant genomes contain a high proportion of duplicated genes as a result of numerous whole, segmental and local duplications. These duplications lead up to the formation of gene families, which are the usual material for many evolutionary studies. However, all characterized genomes include single-copy (unique) genes that have not received much attention. Unlike gene duplication, gene loss is not an unspecific mechanism but is rather influenced by a functional selection. In this context, we have established and used stringent criteria in order to identify suitable sets of unique genes present in plant proteomes. Comparisons of unique genes in the green phylum were used to characterize the gene and protein features exhibited by both conserved and species-specific unique genes.</p> <p>Results</p> <p>We identified the unique genes within both <it>A. thaliana </it>and <it>O. sativa </it>genomes and classified them according to the number of homologs in the alternative species: none (U{1:0}), one (U{1:1}) or several (U{1:m}). Regardless of the species, all the genes in these groups present some conserved characteristics, such as small average protein size and abnormal intron number. In order to understand the origin and function of unique genes, we further characterized the U{1:1} gene pairs. The possible involvement of sequence convergence in the creation of U{1:1} pairs was discarded due to the frequent conservation of intron positions. Furthermore, an orthology relationship between the two members of each U{1:1} pair was strongly supported by a high conservation in the protein sizes and transcription levels. Within the promoter of the unique conserved genes, we found a number of TATA and TELO boxes that specifically differed from their mean number in the whole genome. Many unique genes have been conserved as unique through evolution from the green alga <it>Ostreococcus lucimarinus </it>to higher plants. Plant unique genes may also have homologs in bacteria and we showed a link between the targeting towards plastids of proteins encoded by plant nuclear unique genes and their homology with a bacterial protein.</p> <p>Conclusion</p> <p>Many of the <it>A. thaliana </it>and <it>O. sativa </it>unique genes are conserved in plants for which the ancestor diverged at least 725 million years ago (MYA). Half of these genes are also present in other eukaryotic and/or prokaryotic species. Thus, our results indicate that (i) a strong negative selection pressure has conserved a number of genes as unique in genomes throughout evolution, (ii) most unique genes are subjected to a low divergence rate, (iii) they have some features observed in housekeeping genes but for most of them there is no functional annotation and (iv) they may have an ancient origin involving a possible gene transfer from ancestral chloroplasts or bacteria to the plant nucleus.</p

    Genes of the most conserved WOX clade in plants affect root and flower development in Arabidopsis

    Get PDF
    Background: The Wuschel related homeobox (WOX) family proteins are key regulators implicated in the determination of cell fate in plants by preventing cell differentiation. A recent WOX phylogeny, based on WOX homeodomains, showed that all of the Physcomitrella patens and Selaginella moellendorffii WOX proteins clustered into a single orthologous group. We hypothesized that members of this group might preferentially share a significant part of their function in phylogenetically distant organisms. Hence, we first validated the limits of the WOX13 orthologous group (WOX13 OG) using the occurrence of other clade specific signatures and conserved intron insertion sites. Secondly, a functional analysis using expression data and mutants was undertaken. Results: The WOX13 OG contained the most conserved plant WOX proteins including the only WOX detected in the highly proliferating basal unicellular and photosynthetic organism Ostreococcus tauri. A large expansion of the WOX family was observed after the separation of mosses from other land plants and before monocots and dicots have arisen. In Arabidopsis thaliana, AtWOX13 was dynamically expressed during primary and lateral root initiation and development, in gynoecium and during embryo development. AtWOX13 appeared to affect the floral transition. An intriguing clade, represented by the functional AtWOX14 gene inside the WOX13 OG, was only found in the Brassicaceae. Compared to AtWOX13, the gene expression profile of AtWOX14 was restricted to the early stages of lateral root formation and specific to developing anthers. A mutational insertion upstream of the AtWOX14 homeodomain sequence led to abnormal root development, a delay in the floral transition and premature anther differentiation. Conclusion: Our data provide evidence in favor of the WOX13 OG as the clade containing the most conserved WOX genes and established a functional link to organ initiation and development in Arabidopsis, most likely by preventing premature differentiation. The future use of Ostreococcus tauri and Physcomitrella patens as biological models should allow us to obtain a better insight into the functional importance of WOX13 OG genes

    Exploration of plant genomes in the FLAGdb++ environment

    Get PDF
    Background : In the contexts of genomics, post-genomics and systems biology approaches, data integration presents a major concern. Databases provide crucial solutions: they store, organize and allow information to be queried, they enhance the visibility of newly produced data by comparing them with previously published results, and facilitate the exploration and development of both existing hypotheses and new ideas. Results : The FLAGdb++ information system was developed with the aim of using whole plant genomes as physical references in order to gather and merge available genomic data from in silico or experimental approaches. Available through a JAVA application, original interfaces and tools assist the functional study of plant genes by considering them in their specific context: chromosome, gene family, orthology group, co-expression cluster and functional network. FLAGdb++ is mainly dedicated to the exploration of large gene groups in order to decipher functional connections, to highlight shared or specific structural or functional features, and to facilitate translational tasks between plant species (Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera). Conclusion : Combining original data with the output of experts and graphical displays that differ from classical plant genome browsers, FLAGdb++ presents a powerful complementary tool for exploring plant genomes and exploiting structural and functional resources, without the need for computer programming knowledge. First launched in 2002, a 15th version of FLAGdb++ is now available and comprises four model plant genomes and over eight million genomic features

    Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the finishing of the sequencing of the <it>Arabidopsis thaliana </it>genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models.</p> <p>Results</p> <p>The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS.</p> <p>Conclusion</p> <p>This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs.</p

    Sélection de variables pour la classification par mélanges gaussiens pour prédire la fonction des gÚnes orphelins

    Get PDF
    Biologists are interested in predicting the gene functions of sequenced genome organisms according to microarray transcriptome data. The microarray technology development allows one to study the whole genome in different experimental conditions. The information abundance may seem to be an advantage for the gene clustering. However, the structure of interest can often be contained in a subset of the available variables. The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. A more versatile variable selection model is proposed, taking into account three possible roles for each variable: The relevant clustering variables, the redundant variables and the independent variables. A model selection criterion and a variable selection algorithm are derived for this new variable role modelling. The interest of this new modelling for discovering the function of orphan genes is highlighted on a transcriptome dataset for the arabidopsis thaliana plant.Les biologistes s’attachent actuellement Ă  prĂ©dire la fonction des gĂšnes d’organismes de gĂ©nome sĂ©quence Ă  partir de donnĂ©es transcriptomes, issues de l’utilisation des puces Ă  ADN. Le dÂŽdĂ©veloppement de cette technologie permet de tester l’expression de l’ensemble du gĂ©nome dans de nombreuses conditions expĂ©rimentales. Cette quantitĂ© d’information peut alors sembler ĂȘtre un atout pour la classification des gĂšnes. Pourtant il est courant que seul un sous-ensemble contienne l’information pertinente pour la classification. Les procĂ©dures de sĂ©lection des variables en classification non supervisĂ©e par mĂ©langes gaussiens supposent gĂ©nĂ©ralement que les variables non informatives pour la classification sont soit toutes indĂ©pendantes, soit liĂ©es Ă  des variables informatives. Nous proposons une nouvelle modĂ©lisation du rĂŽle des variables plus polyvalente : les variables sont soit informatives pour la classification, soit redondantes, soit totalement indĂ©pendantes. Nous proposons un critĂšre de sĂ©lection des variables et un algorithme pour cette nouvelle modĂ©lisation. L’intĂ©rĂȘt de cette nouvelle modĂ©lisation pour la prĂ©diction de la fonction des gĂšnes orphelins est illustrĂ©e sur un ensemble de donnĂ©es transcriptomes obtenues chez Arabidopsis thaliana

    GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts

    Get PDF
    Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Pro

    GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts

    Get PDF
    Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot

    Relations entre l'organisation des sites de fixation des facteurs de transcription, la fonction des gĂšnes et l'expression des gĂšnes (vers une annotation des sites de fixation chez Arabidopsis thaliana)

    No full text
    Les sites de fixation des facteurs de transcription ou Ă©lĂ©ments rĂ©gulateurs sont impliquĂ©s dans la rĂ©gulation de l'expression des gĂšnes. Une meilleure connaissance de l'architecture des promoteurs est aujourd'hui accessible via l annotation des gĂ©nomes et les donnĂ©es transcriptomiques. Certains Ă©lĂ©ments rĂ©gulateurs sont conservĂ©s Ă  une position prĂ©fĂ©rentielle dans les promoteurs. Chez A. thaliana, nous avons mis au point une approche pour caractĂ©riser de tels motifs. Ce travail a permis de proposer une cartographie des promoteurs en identifiant 5105 motifs caractĂ©risĂ©s par une sur-reprĂ©sentation locale dans les promoteurs proximaux. L Ă©tude du promoteur central oĂč est observĂ©e la boĂźte TATA, Ă©lĂ©ment rĂ©gulateur conservĂ© entre eucaryotes, a Ă©tĂ© approfondie. Une liste de 15 variants fonctionnels de la boĂźte TATA a Ă©tĂ© identifiĂ©e, ainsi qu une nouvelle classe d Ă©lĂ©ments rĂ©gulateurs qui sont caractĂ©risĂ©s par des mĂȘmes contraintes topologiques que la boĂźte TATA: les motifs-TC. Ils sont conservĂ©s chez A. thaliana et O. sativa, mais absents chez les mammifĂšres. Les 18% de gĂšnes d A. thaliana contenant un motif-TC ont tendance Ă  ĂȘtre exprimĂ©s dans des conditions expĂ©rimentales spĂ©cifiques. Ces Ă©lĂ©ments pourraient participer Ă  la rĂ©gulation de l expression des gĂšnes. L Ă©tude de l Ă©lĂ©ment initiateur YR chez A. thaliana a mis en Ă©vidence une extension de ces 4 dinuclĂ©otides dans l UTR 5 . Des associations entre ces Ă©lĂ©ments rĂ©gulateurs peuvent montrer une collaboration fonctionnelle. La recherche de caractĂ©ristiques fonctionnelles communes aux gĂšnes possĂ©dant une mĂȘme organisation d'Ă©lĂ©ments rĂ©gulateurs pourra permettre de contribuer Ă  l annotation fonctionnelle de ces Ă©lĂ©ments.Transcription factor binding sites are regulatory elements involved in gene expression regulation. The knowledge of promoter architecture is now possible due to genome annotation and transcriptomic data. Some regulatory elements are conserved at a precise location in promoters. We developed an approach to characterize such motifs in A. thaliana. This work led to the promoter cartography by the identification of 5105 over-represented motifs in proximal promoters. The TATA-box is a regulatory element conserved within eukaryotes. The core-promoter where this element is expected has been thoroughly analysed. We identified a list of 15 functional variants of the TATA-box and a new class of regulatory elements that shares the TATA-box topological constraints: the TC-motifs. They are conserved in both A. thaliana and O. sativa and have not been observed in mammalian genomes. The A. thaliana genes containing a TC-motif are 18%. They are mainly expressed in specific experimental conditions. The TC-motifs might be involved in gene expression regulation. We observed that the 4 dinucleotides of the initiator element YR in A. thaliana are extended in 5 UTR. Associations between these regulatory elements may highlight a functional collaboration. The study of the functional characteristics of genes with a same regulatory elements organization might help in these elements functional annotation.EVRY-Bib. Ă©lectronique (912289901) / SudocSudocFranceF

    Analyse de l'évolution du génome d'Arabidopsis thaliana par l'étude de familles de gÚnes

    No full text
    PARIS7-BibliothĂšque centrale (751132105) / SudocSudocFranceF
    corecore