1,820 research outputs found

    Transcriptional Regulation: a Genomic Overview

    Get PDF
    The availability of the Arabidopsis thaliana genome sequence allows a comprehensive analysis of transcriptional regulation in plants using novel genomic approaches and methodologies. Such a genomic view of transcription first necessitates the compilation of lists of elements. Transcription factors are the most numerous of the different types of proteins involved in transcription in eukaryotes, and the Arabidopsis genome codes for more than 1,500 of them, or approximately 6% of its total number of genes. A genome-wide comparison of transcription factors across the three eukaryotic kingdoms reveals the evolutionary generation of diversity in the components of the regulatory machinery of transcription. However, as illustrated by Arabidopsis, transcription in plants follows similar basic principles and logic to those in animals and fungi. A global view and understanding of transcription at a cellular and organismal level requires the characterization of the Arabidopsis transcriptome and promoterome, as well as of the interactome, the localizome, and the phenome of the proteins involved in transcription

    Always read the introduction : integrating regulatory and coding sequence evolution in yeast

    Get PDF
    We analyze duplicate genes in a yeast, Saccharomyces cerevisiae with the aim of determining a genes history and to observe that gene in its genomic context. In Chapter 2 we show that the fate of a duplicate gene pair is in part determined by its genome location. Moreover, we show that for two classes of duplicate genes, resulting from either small-scale duplication or whole-genome duplication, this fate can often be assessed by measuring the patterns of asymmetry in the sequence divergence of the genes in question. In Chapter 3 we study duplicate genes in the context of their local environments by comparing the patterns of evolution in the coding sequences of duplicate genes for ribosomal proteins with their upstream non-coding sequences. We found that while the coding sequences show strong evidence of recent gene conversion events, similar patterns are not seen in the non-coding regulatory elements. These duplicated ribosomal proteins are not functionally redundant despite their very high degree of protein sequence identity. This analysis confirms that the duplicated proteins have diverged considerably in expression despite their similar protein sequences. In Chapter 4 we analyze the structure of the transcriptional regulation network and characterize the molecular evolution of both its transcriptional regulators and their regulated genes. We found that both subfunctionalization and neofunctionalization of transcription factor binding play a role in divergence

    Coding limits on the number of transcription factors

    Get PDF
    Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms. We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction. The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.Comment: http://www.weizmann.ac.il/complex/tlusty/papers/BMCGenomics2006.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1590034/ http://www.biomedcentral.com/1471-2164/7/23

    Predicting genome-wide redundancy using machine learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as <it>Arabidopsis thaliana</it>, the test case used here.</p> <p>Results</p> <p>Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in <it>Arabidopsis </it>showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods.</p> <p>Conclusions</p> <p>Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for <it>Arabidopsis </it>provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.</p

    Genome-Wide Analysis of Gene Expression during Early Arabidopsis Flower Development

    Get PDF
    Detailed information about stage-specific changes in gene expression is crucial for the understanding of the gene regulatory networks underlying development. Here, we describe the global gene expression dynamics during early flower development, a key process in the life cycle of a plant, during which floral patterning and the specification of floral organs is established. We used a novel floral induction system in Arabidopsis, which allows the isolation of a large number of synchronized floral buds, in conjunction with whole-genome microarray analysis to identify genes with differential expression at distinct stages of flower development. We found that the onset of flower formation is characterized by a massive downregulation of genes in incipient floral primordia, which is followed by a predominance of gene activation during the differentiation of floral organs. Among the genes we identified as differentially expressed in the experiment, we detected a significant enrichment of closely related members of gene families. The expression profiles of these related genes were often highly correlated, indicating similar temporal expression patterns. Moreover, we found that the majority of these genes is specifically up-regulated during certain developmental stages. Because co-expressed members of gene families in Arabidopsis frequently act in a redundant manner, these results suggest a high degree of functional redundancy during early flower development, but also that its extent may vary in a stage-specific manner

    Retention and integration of gene duplicates in eukaryotes

    Get PDF

    Upstream plasticity and downstream robustness in evolution of molecular networks

    Get PDF
    BACKGROUND: Gene duplication followed by the functional divergence of the resulting pair of paralogous proteins is a major force shaping molecular networks in living organisms. Recent species-wide data for protein-protein interactions and transcriptional regulations allow us to assess the effect of gene duplication on robustness and plasticity of these molecular networks. RESULTS: We demonstrate that the transcriptional regulation of duplicated genes in baker's yeast Saccharomyces cerevisiae diverges fast so that on average they lose 3% of common transcription factors for every 1% divergence of their amino acid sequences. The set of protein-protein interaction partners of their protein products changes at a slower rate exhibiting a broad plateau for amino acid sequence similarity above 70%. The stability of functional roles of duplicated genes at such relatively low sequence similarity is further corroborated by their ability to substitute for each other in single gene knockout experiments in yeast and RNAi experiments in a nematode worm Caenorhabditis elegans. We also quantified the divergence rate of physical interaction neighborhoods of paralogous proteins in a bacterium Helicobacter pylori and a fly Drosophila melanogaster. However, in the absence of system-wide data on transcription factors' binding in these organisms we could not compare this rate to that of transcriptional regulation of duplicated genes. CONCLUSIONS: For all molecular networks studied in this work we found that even the most distantly related paralogous proteins with amino acid sequence identities around 20% on average have more similar positions within a network than a randomly selected pair of proteins. For yeast we also found that the upstream regulation of genes evolves more rapidly than downstream functions of their protein products. This is in accordance with a view which puts regulatory changes as one of the main driving forces of the evolution. In this context a very important open question is to what extent our results obtained for homologous genes within a single species (paralogs) carries over to homologous proteins in different species (orthologs)

    Fast-evolving homeobox genes in mammalian preimplantation development

    Get PDF
    Transcription factor proteins containing the homeodomain motif orchestrate myriad key functions during embryonic development and are therefore often conserved to a spectacular degree over extensive evolutionary timescales. Given this paradigm, the recent discovery of several homeobox families with roles in embryogenesis but which appear to be rapidly-evolving is a puzzling development. In this thesis I focus on one such group, the mammalian-specific Eutherian Totipotent Cell Homeobox (ETCHbox) genes, with the overarching aim of advancing our understanding of the function fast- evolving genes perform in the embryo and the forces that have combined to produce their unusual evolutionary trajectories. Analysis of single-cell RNA-sequencing data finds that ETCHbox genes are activated during the major wave of embryonic genome activation across various mammalian species. Ectopic expression experiments of ETCHbox proteins in cell culture suggest them to be multifunctional regulators of several key processes during preimplantation development, including blastocyst formation; inner cell mass development; the activity of transposable elements; and cellular potency. From an evolutionary perspective, dramatic changes in ETCHbox protein-coding sequences and copy number have occurred between lineages, and this has been driven at least in part by positive selection. Rapid protein-coding sequence evolution has resulted in small alterations to gene functions between closely related species (e.g. within primates) but highly divergent transcription factor functions between humans and cattle. Overall, I conclude that the lability of preimplantation development, combined with functional redundancy of ETCHbox proteins and positive selection acting on ETCHbox sequences, have combined to produce the diverse repertoires we observe today
    corecore