77 research outputs found

    Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs.

    Get PDF
    The ortholog conjecture implies that functional similarity between orthologous genes is higher than between paralogs. It has been supported using levels of expression and Gene Ontology term analysis, although the evidence was rather weak and there were also conflicting reports. In this study on 12 species we provide strong evidence of high conservation in tissue-specificity between orthologs, in contrast to low conservation between within-species paralogs. This allows us to shed a new light on the evolution of gene expression patterns. While there have been several studies of the correlation of expression between species, little is known about the evolution of tissue-specificity itself. Ortholog tissue-specificity is strongly conserved between all tetrapod species, with the lowest Pearson correlation between mouse and frog at r = 0.66. Tissue-specificity correlation decreases strongly with divergence time. Paralogs in human show much lower conservation, even for recent Primate-specific paralogs. When both paralogs from ancient whole genome duplication tissue-specific paralogs are tissue-specific, it is often to different tissues, while other tissue-specific paralogs are mostly specific to the same tissue. The same patterns are observed using human or mouse as focal species, and are robust to choices of datasets and of thresholds. Our results support the following model of evolution: in the absence of duplication, tissue-specificity evolves slowly, and tissue-specific genes do not change their main tissue of expression; after small-scale duplication the less expressed paralog loses the ancestral specificity, leading to an immediate difference between paralogs; over time, both paralogs become more broadly expressed, but remain poorly correlated. Finally, there is a small number of paralog pairs which stay tissue-specific with the same main tissue of expression, for at least 300 million years

    Molecular Evolution and Gene Function

    Get PDF
    One of the basic questions of phylogenomics is how gene function evolves, whether among species or inside gene families. In this chapter, we provide a brief overview of the problems associated with defining gene function in a manner which allows comparisons which are both large scale and evolutionarily relevant. The main source of functional data, despite its limitations, is transcrip-tomics. Functional data provides information on evolutionary mechanisms primarily by showing which functional classes of genes evolve under stronger or weaker purifying or adaptive selection, and on which classes of mutations (e.g., substitutions or duplications). However, the example of the "ortholog conjecture" shows that we are still not at a point where we can confidently study phylogenomically the evolution of gene function at a precise scale

    Developmental Constraints on Genome Evolution in Four Bilaterian Model Species.

    Get PDF
    Developmental constraints on genome evolution have been suggested to follow either an early conservation model or an "hourglass" model. Both models agree that late development strongly diverges between species, but debate on which developmental period is the most conserved. Here, based on a modified "Transcriptome Age Index" approach, that is, weighting trait measures by expression level, we analyzed the constraints acting on three evolutionary traits of protein coding genes (strength of purifying selection on protein sequences, phyletic age, and duplicability) in four species: Nematode worm Caenorhabditis elegans, fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. In general, we found that both models can be supported by different genomic properties. Sequence evolution follows an hourglass model, but the evolution of phyletic age and of duplicability follow an early conservation model. Further analyses indicate that stronger purifying selection on sequences in the middle development are driven by temporal pleiotropy of these genes. In addition, we report evidence that expression in late development is enriched with retrogenes, which usually lack efficient regulatory elements. This implies that expression in late development could facilitate transcription of new genes, and provide opportunities for acquisition of function. Finally, in C. elegans, we suggest that dosage imbalance could be one of the main factors that cause depleted expression of high duplicability genes in early development

    Expression Evolution of Mammalian Genes.

    Full text link
    Comparing the expression-profiles of over 10,000 genes from the human and mouse genomes, I address fundamental questions on mammalian gene expression. First, I demonstrate that over 80% of human-mouse orthologous genes are evolutionarily conserved in their expression-profiles. This result highlights the importance of proper gene expression to fitness. Second, I show that highly expressed and tissue-specific genes tend to evolve slowly in expression-profile, implying that the expression pattern is of particular importance to highly expressed and tissue-specific genes. I then investigate the potential roles that gene expression plays in protein sequence evolution, dynamics of genome organization, and evolutionary changes of gene essentiality in mammals. My results indicate that tissue-specificity is a stronger determinant on protein evolutionary rate than gene expression level, a factor that is known to be the most important rate determinant in yeasts. The result suggests a great variation in rate determinants of protein sequence evolution between unicellular and multicellular organisms. Subsequently, my analyses on the origin of co-expressed gene clusters indicate that co-expression of linked genes is a form of transcriptional interference that is disadvantageous to organisms, suggesting that transcriptional interference may promote recurrent relocations of genes in the genome. Lastly, I study underlying mechanisms of the evolution of gene essentiality. The results show that the changes of gene essentiality appear to be associated with adaptive evolution at the protein-sequence level, while gene duplication and gene expression evolution plays a negligible role. Together, my studies help understand patterns, mechanisms and consequences of gene expression evolution.Ph.D.Ecology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/60816/1/liaoby_1.pd

    Borders of Cis-Regulatory DNA Sequences Preferentially Harbor the Divergent Transcription Factor Binding Motifs in the Human Genome

    Get PDF
    Changes in cis-regulatory DNA sequences and transcription factor (TF) repertoires provide major sources of phenotypic diversity that shape the evolution of gene regulation in eukaryotes. The DNA-binding specificities of TFs may be diversified or produce new variants in different eukaryotic species. However, it is currently unclear how various levels of divergence in TF DNA-binding specificities or motifs became introduced into the cis-regulatory DNA regions of the genome over evolutionary time. Here, we first estimated the evolutionary divergence levels of TF binding motifs and quantified their occurrence at DNase I-hypersensitive sites. Results from our in silico motif scan and experimentally derived chromatin immunoprecipitation (TF-ChIP) show that the divergent motifs tend to be introduced in the edges of cis-regulatory regions, which is probably accompanied by the expansion of the accessible core of promoter-associated regulatory elements during evolution. We also find that the genes neighboring the expanded cis-regulatory regions with the most divergent motifs are associated with functions like development and morphogenesis. Accordingly, we propose that the accumulation of divergent motifs in the edges of cis-regulatory regions provides a functional mechanism for the evolution of divergent regulatory circuits

    Systems level expression correlation of Ras GTPase regulators

    Get PDF
    Background: Proteins of the ubiquitously expressed core proteome are quantitatively correlated across multiple eukaryotic species. In addition, it was found that many protein paralogues exhibit expression anticorrelation, suggesting that the total level of protein with a given functionality must be kept constant. Methods: We performed Spearman’s rank correlation analyses of gene expression levels for the RAS GTPase subfamily and their regulatory GEF and GAP proteins across tissues and across individuals for each tissue. A large set of published data for normal tissues from a wide range of species, human cancer tissues and human cell lines was analysed. Results: We show that although the multidomain regulatory proteins of Ras GTPases exhibit considerable tissue and individual gene expression variability, their total amounts are balanced in normal tissues. In a given tissue, the sum of activating (GEFs) and deactivating (GAPs) domains of Ras GTPases can vary considerably, but each person has balanced GEF and GAP levels. This balance is impaired in cell lines and in cancer tissues for some individuals. Conclusions: Our results are relevant for critical considerations of knock out experiments, where functionally related homologs may compensate for the down regulation of a protein

    Specificity Determination by paralogous winged helix-turn-helix transcription factors

    Get PDF
    Transcription factors (TFs) localize to regulatory regions throughout the genome, where they exert physical or enzymatic control over the transcriptional machinery and regulate expression of target genes. Despite the substantial diversity of TFs found across all kingdoms of life, most belong to a relatively small number of structural families characterized by homologous DNA-binding domains (DBDs). In homologous DBDs, highly-conserved DNA-contacting residues define a characteristic ‘recognition potential’, or the limited sequence space containing high-affinity binding sites. Specificity-determining residues (SDRs) alter DNA binding preferences to further delineate this sequence space between homologous TFs, enabling functional divergence through the recognition of distinct genomic binding sites. This thesis explores the divergent DNA-binding preferences among dimeric, winged helix-turn-helix (wHTH) TFs belonging to the OmpR sub-family. As the terminal effectors of orthogonal two-component signaling pathways in Escherichia coli, OmpR paralogs bind distinct genomic sequences and regulate the expression of largely non-overlapping gene networks. Using high-throughput SELEX, I discover multiple sources of variation in DNA-binding, including the spacing and orientation of monomer sites as well as a novel binding ‘mode’ with unique half-site preferences (but retaining dimeric architecture). Surprisingly, given the diversity of residues observed occupying positions in contact with DNA, there are only minor quantitative differences in sequence-specificity between OmpR paralogs. Combining phylogenetic, structural, and biological information, I then define a comprehensive set of putative SDRs, which, although distributed broadly across the protein:DNA interface, preferentially localize to the major groove of the DNA helix. Direct specificity profiling of SDR variants reveals that individual SDRs impact local base preferences as well as global structural properties of the protein:DNA complex. This study demonstrates clearly that OmpR family TFs possess multiple ‘axes of divergence’, including base recognition, dimeric architecture, and structural attributes of the protein:DNA complex. It also provides evidence for a common structural ‘code’ for DNA-binding by OmpR homologues, and demonstrates that surprisingly modest residue changes can enable recognition of highly divergent sequence motifs. Importantly, well-characterized genomic binding sites for many of the TFs in this study diverge substantially from the presented de novo models, and it is unclear how mutations may affect binding in more complex environments. Further analysis using native sequences is required to build combined models of cis- and trans-evolution of two-component regulatory networks

    Functional analysis of SMYD2 and SMYD3 lysine methyltransferases

    Get PDF
    Includes bibliographical references.2016 Fall.The proteins SMYD2 and SMYD3 are two of five members of a unique family of lysine methyltransferases defined by a catalytic SET domain that is split into two segments by a MYND protein interaction domain, followed by a cysteine-rich post-SET domain. The SMYD family members have been shown to be essential for cellular development, cell cycle progression, and when dysregulated, tumorigenesis. SMYD1 has been widely studied as a pivotal component of cardiac and skeletal muscle development. Although their three dimensional structures have been solved, less is known about functional consequences of SMYD2 and SMYD3. Aberrant overexpression of SMYDs 2 and 3 have been implicated in numerous malignancies, and both have been studied as potential therapeutic targets. The overriding aim of our research is to obtain a more thorough understanding of SMYD2 and SMYD3 function. In Chapters 1 and 2, we outline essential background regarding the SMYD family and the methods used in our studies. In Chapter 3, we address the consequences of the interaction of SMYD3 with the nuclear chaperone, HSP90. Each have been independently implicated as proto-oncogenes in several human malignancies. Loss of SMYD3-HSP90 interaction leads to SMYD3 mislocalization within the nucleus, thereby severing its association with chromatin. This results in reduction of SMYD3-mediated cell proliferation and, consequentially, impairment of SMYD3’s oncogenic activity. We suggest a novel approach for blocking HSP90-driven malignancy which may have reduced toxicity over current HSP90 inhibitors. In Chapter 4, we turn our attention to SMYD2 and its putative role in hematopoietic carcinogenesis. In order to study the effect of SMYD2 in tumor initiation, we employed transforming oncogenes to study the consequences of SMYD2 loss in three hematopoietic models: B-Acute Lymphocytic Leukemia (B-ALL), Chronic Myeloid Leukemia (CML), and Mixed Lineage Leukemia (MLL). Loss of SMYD2 in CML and MLL, but not in B-ALL, models led to cell cycle block following by widespread apoptosis and cell death. Tumorigenicity, as assessed in vitro by colony formation and in vivo by NOD/SCID transformation, was dependent upon SMYD2. Gene expression analyses indicated that, as previously determined in multiple studies, impairment included reduction in the level of the p53 tumor suppressor. Collectively, these studies establish SMYD2 as a putative proto-oncogene in CML and MLL. In Chapter 5, we report our efforts to extend the above findings to the living organism. SMYD2 was conditionally deleted via cre/Lox methodology from the germline of C57BL.6 mice exclusively in hematopoietic progenitors. SMYD2-deficient mice were born healthy and achieved normal lifespans. However, consistent with our findings of Chapter 4, we observed significant blocks in the progression of fetal and bone marrow hematopoietic stem cells to both B lymphocyte and myeloid lineages. While these blocks led to an overall reduction of mature peripheral B cells, SMYD2-deficient mice maintained a relatively normal immune response. These studies further support a model in which SMYD2 is required for normal hematopoiesis transformation
    corecore