24 research outputs found

    A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

    Get PDF
    Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns

    Understanding the phylogeographic patterns of European hedgehogs, Erinaceus concolor and E-europaeus using the MHC

    No full text
    The genome of the European hedgehog, Erinaceus concolor and E. europaeus, shows a strong signal of cycles of restriction to glacial refugia and postglacial expansion. Patterns of expansion, however, differ for mitochondrial DNA (mtDNA) and preliminary analysis of nuclear markers. In this study, we determine phylogeographic patterns in the hedgehog using two loci of the major histocompatibility complex (MHC), isolated for the first time in hedgehogs. These genes show long persistence times and high polymorphism in many species because of the actions of balancing selection. Among 84 individuals screened for variation, only two DQA alleles were identified in each species, but 10 DQB alleles were found in E. concolor and six in E. europaeus. A strong effect of demography on patterns of DQB variability is observed, with only weak evidence of balancing selection. While data from mtDNA clearly subdivide both species into monophyletic subgroups, the MHC data delineate only E. concolor into distinct subgroups, supporting the preliminary findings of other nuclear markers. Together with differences in variability, this suggests that the refugia history and/or expansion patterns of E. concolor and E. europaeus differ
    corecore