2,702 research outputs found

    Genomic and proteomic biases inform metabolic engineering strategies for anaerobic fungi.

    Get PDF
    Anaerobic fungi (Neocallimastigomycota) are emerging non-model hosts for biotechnology due to their wealth of biomass-degrading enzymes, yet tools to engineer these fungi have not yet been established. Here, we show that the anaerobic gut fungi have the most GC depleted genomes among 443 sequenced organisms in the fungal kingdom, which has ramifications for heterologous expression of genes as well as for emerging CRISPR-based genome engineering approaches. Comparative genomic analyses suggest that anaerobic fungi may contain cellular machinery to aid in sexual reproduction, yet a complete mating pathway was not identified. Predicted proteomes of the anaerobic fungi also contain an unusually large fraction of proteins with homopolymeric amino acid runs consisting of five or more identical consecutive amino acids. In particular, threonine runs are especially enriched in anaerobic fungal carbohydrate active enzymes (CAZymes) and this, together with a high abundance of predicted N-glycosylation motifs, suggests that gut fungal CAZymes are heavily glycosylated, which may impact heterologous production of these biotechnologically useful enzymes. Finally, we present a codon optimization strategy to aid in the development of genetic engineering tools tailored to these early-branching anaerobic fungi

    COMIT: identification of noncoding motifs under selection in coding sequences

    Get PDF
    COMIT is presented; an algorithm for detecting functional non-coding motifs in coding regions, separating nucleotide and amino acid effects

    Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

    Get PDF
    Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies

    Regulation of splicing factors by alternative splicing and NMD is conserved between kingdoms yet evolutionarily flexible.

    Get PDF
    Ultraconserved elements, unusually long regions of perfect sequence identity, are found in genes encoding numerous RNA-binding proteins including arginine-serine rich (SR) splicing factors. Expression of these genes is regulated via alternative splicing of the ultraconserved regions to yield mRNAs that are degraded by nonsense-mediated mRNA decay (NMD), a process termed unproductive splicing (Lareau et al. 2007; Ni et al. 2007). As all human SR genes are affected by alternative splicing and NMD, one might expect this regulation to have originated in an early SR gene and persisted as duplications expanded the SR family. But in fact, unproductive splicing of most human SR genes arose independently (Lareau et al. 2007). This paradox led us to investigate the origin and proliferation of unproductive splicing in SR genes. We demonstrate that unproductive splicing of the splicing factor SRSF5 (SRp40) is conserved among all animals and even observed in fungi; this is a rare example of alternative splicing conserved between kingdoms, yet its effect is to trigger mRNA degradation. As the gene duplicated, the ancient unproductive splicing was lost in paralogs, and distinct unproductive splicing evolved rapidly and repeatedly to take its place. SR genes have consistently employed unproductive splicing, and while it is exceptionally conserved in some of these genes, turnover in specific events among paralogs shows flexible means to the same regulatory end

    Multi-species sequence comparison reveals dynamic evolution of the elastin gene that has involved purifying selection and lineage-specific insertions/deletions

    Get PDF
    BACKGROUND: The elastin gene (ELN) is implicated as a factor in both supravalvular aortic stenosis (SVAS) and Williams Beuren Syndrome (WBS), two diseases involving pronounced complications in mental or physical development. Although the complete spectrum of functional roles of the processed gene product remains to be established, these roles are inferred to be analogous in human and mouse. This view is supported by genomic sequence comparison, in which there are no large-scale differences in the ~1.8 Mb sequence block encompassing the common region deleted in WBS, with the exception of an overall reversed physical orientation between human and mouse. RESULTS: Conserved synteny around ELN does not translate to a high level of conservation in the gene itself. In fact, ELN orthologs in mammals show more sequence divergence than expected for a gene with a critical role in development. The pattern of divergence is non-conventional due to an unusually high ratio of gaps to substitutions. Specifically, multi-sequence alignments of eight mammalian sequences reveal numerous non-aligning regions caused by species-specific insertions and deletions, in spite of the fact that the vast majority of aligning sites appear to be conserved and undergoing purifying selection. CONCLUSIONS: The pattern of lineage-specific, in-frame insertions/deletions in the coding exons of ELN orthologous genes is unusual and has led to unique features of the gene in each lineage. These differences may indicate that the gene has a slightly different functional mechanism in mammalian lineages, or that the corresponding regions are functionally inert. Identified regions that undergo purifying selection reflect a functional importance associated with evolutionary pressure to retain those features

    Modeling an Evolutionary Conserved Circadian Cis-Element

    Get PDF
    Circadian oscillator networks rely on a transcriptional activator called CLOCK/CYCLE (CLK/CYC) in insects and CLOCK/BMAL1 or NPAS2/BMAL1 in mammals. Identifying the targets of this heterodimeric basic-helix-loop-helix (bHLH) transcription factor poses challenges and it has been difficult to decipher its specific sequence affinity beyond a canonical E-box motif, except perhaps for some flanking bases contributing weakly to the binding energy. Thus, no good computational model presently exists for predicting CLK/CYC, CLOCK/BMAL1, or NPAS2/BMAL1 targets. Here, we use a comparative genomics approach and first study the conservation properties of the best-known circadian enhancer: a 69-bp element upstream of the Drosophila melanogaster period gene. This fragment shows a signal involving the presence of two closely spaced E-box–like motifs, a configuration that we can also detect in the other four prominent CLK/CYC target genes in flies: timeless, vrille, Pdp1, and cwo. This allows for the training of a probabilistic sequence model that we test using functional genomics datasets. We find that the predicted sequences are overrepresented in promoters of genes induced in a recent study by a glucocorticoid receptor-CLK fusion protein. We then scanned the mouse genome with the fly model and found that many known CLOCK/BMAL1 targets harbor sequences matching our consensus. Moreover, the phase of predicted cyclers in liver agreed with known CLOCK/BMAL1 regulation. Taken together, we built a predictive model for CLK/CYC or CLOCK/BMAL1-bound cis-enhancers through the integration of comparative and functional genomics data. Finally, a deeper phylogenetic analysis reveals that the link between the CLOCK/BMAL1 complex and the circadian cis-element dates back to before insects and vertebrates diverged

    Automated Conserved Non-Coding Sequence (CNS) Discovery Reveals Differences in Gene Content and Promoter Evolution among Grasses

    Get PDF
    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by \u3e12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize

    Divergent Evolution of Human p53 Binding Sites: Cell Cycle Versus Apoptosis

    Get PDF
    The p53 tumor suppressor is a sequence-specific pleiotropic transcription factor that coordinates cellular responses to DNA damage and stress, initiating cell-cycle arrest or triggering apoptosis. Although the human p53 binding site sequence (or response element [RE]) is well characterized, some genes have consensus-poor REs that are nevertheless both necessary and sufficient for transactivation by p53. Identification of new functional gene regulatory elements under these conditions is problematic, and evolutionary conservation is often employed. We evaluated the comparative genomics approach for assessing evolutionary conservation of putative binding sites by examining conservation of 83 experimentally validated human p53 REs against mouse, rat, rabbit, and dog genomes and detected pronounced conservation differences among p53 REs and p53-regulated pathways. Bona fide NRF2 (nuclear factor [erythroid-derived 2]-like 2 nuclear factor) and NFκB (nuclear factor of kappa light chain gene enhancer in B cells) binding sites, which direct oxidative stress and innate immunity responses, were used as controls, and both exhibited high interspecific conservation. Surprisingly, the average p53 RE was not significantly more conserved than background genomic sequence, and p53 REs in apoptosis genes as a group showed very little conservation. The common bioinformatics practice of filtering RE predictions by 80% rodent sequence identity would not only give a false positive rate of ∼19%, but miss up to 57% of true p53 REs. Examination of interspecific DNA base substitutions as a function of position in the p53 consensus sequence reveals an unexpected excess of diversity in apoptosis-regulating REs versus cell-cycle controlling REs (rodent comparisons: p < 1.0 e−12). While some p53 REs show relatively high levels of conservation, REs in many genes such as BAX, FAS, PCNA, CASP6, SIVA1, and P53AIP1 show little if any homology to rodent sequences. This difference suggests that among mammalian species, evolutionary conservation differs among p53 REs, with some having ancient ancestry and others of more recent origin. Overall our results reveal divergent evolutionary pressure among the binding targets of p53 and emphasize that comparative genomics methods must be used judiciously and tailored to the evolutionary history of the targeted functional regulatory regions

    Genomic analysis and examination of innate antiviral immunity in the Egyptian rousett bat

    Full text link
    Bats asymptomatically host a number of viruses that are the cause of recently emergent infectious diseases in humans. While the mechanisms underlying this asymptomatic infection are currently not known, studies of sequenced bat genomes help uncover genetic adaptations in bats that may have functional importance in the antiviral response of these animals. To identify differences between antiviral mechanisms in humans and bats, we sequenced, assembled, and analyzed the genome of the Egyptian rousette bat (ERB; Rousettus aegyptiacus), a natural reservoir of Marburg virus and the only known reservoir for any filovirus. We used this genome to understand the evolution of immune genes and gene families in bats, and describe several observations relevant to defense against viruses. We observed an unusual expansion of the NKG2/CD94 natural killer (NK) cell receptor gene families in Egyptian rousette bats relative to other species, and found genomic evidence of unique features and expression of these receptors that may result in a net inhibitory balance within bat NK cells. The expansion of NK cell receptors is matched by an expansion of potential major histocompatibility complex (MHC) class I ligands, which are distributed both within and, surprisingly, outside the canonical MHC loci. We also observed that the type I interferon (IFN) locus is considerably expanded and diversified in the ERB, and that the IFN-ω subfamily contributes most to this expansion. To understand the functional implications of this expansion, we synthesized multiple IFN-ω proteins and examined their antiviral effects. Members of this subfamily are not constitutively expressed but are induced after viral infection, and show antiviral activity in vitro, with different antiviral potencies observed for different IFN-ω proteins. Taken together, these results show that multiple bats, including the ERB, have expanded and diversified numerous antiviral loci, and potentially developed unique adaptations in NK cell receptor signaling, and type I IFN responses. The concerted evolution of so many key components of immunity in the ERB is strongly suggestive of novel modes of antiviral defense that may contribute to the ability of bats to asymptomatically host viruses that are pathogenic in humans
    corecore