33 research outputs found
COUGER-co-factors associated with uniquely-bound genomic regions
Most transcription factors (TFs) belong to protein families that share a common DNA binding domain and have very similar DNA binding preferences. However, many paralogous TFs (i.e. members of the same TF family) perform different regulatory functions and interact with different genomic regions in the cell. A potential mechanism for achieving this differential in vivo specificity is through interactions with protein co-factors. Computational tools for studying the genomic binding profiles of paralogous TFs and identifying their putative co-factors are currently lacking. Here, we present an interactive web implementation of COUGER, a classification-based framework for identifying protein co-factors that might provide specificity to paralogous TFs. COUGER takes as input two sets of genomic regions bound by paralogous TFs, and it identifies a small set of putative co-factors that best distinguish the two sets of sequences. To achieve this task, COUGER uses a classification approach, with features that reflect the DNA-binding specificities of the putative co-factors. The identified co-factors are presented in a user-friendly output page, together with information that allows the user to understand and to explore the contributions of individual co-factor features. COUGER can be run as a stand-alone tool or through a web interface: http://couger.oit.duke.edu
Human-chimpanzee differences in a FZD8 enhancer alter cell-cycle dynamics in the developing neocortex.
The human neocortex differs from that of other great apes in several notable regards, including altered cell cycle, prolonged corticogenesis, and increased size [1-5]. Although these evolutionary changes most likely contributed to the origin of distinctively human cognitive faculties, their genetic basis remains almost entirely unknown. Highly conserved non-coding regions showing rapid sequence changes along the human lineage are candidate loci for the development and evolution of uniquely human traits. Several studies have identified human-accelerated enhancers [6-14], but none have linked an expression difference to a specific organismal trait. Here we report the discovery of a human-accelerated regulatory enhancer (HARE5) of FZD8, a receptor of the Wnt pathway implicated in brain development and size [15, 16]. Using transgenic mice, we demonstrate dramatic differences in human and chimpanzee HARE5 activity, with human HARE5 driving early and robust expression at the onset of corticogenesis. Similar to HARE5 activity, FZD8 is expressed in neural progenitors of the developing neocortex [17-19]. Chromosome conformation capture assays reveal that HARE5 physically and specifically contacts the core Fzd8 promoter in the mouse embryonic neocortex. To assess the phenotypic consequences of HARE5 activity, we generated transgenic mice in which Fzd8 expression is under control of orthologous enhancers (Pt-HARE5::Fzd8 and Hs-HARE5::Fzd8). In comparison to Pt-HARE5::Fzd8, Hs-HARE5::Fzd8 mice showed marked acceleration of neural progenitor cell cycle and increased brain size. Changes in HARE5 function unique to humans thus alter the cell-cycle dynamics of a critical population of stem cells during corticogenesis and may underlie some distinctive anatomical features of the human brain
GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge
<p>Abstract</p> <p>Background</p> <p>Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied.</p> <p>Results</p> <p>We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote.</p> <p>Conclusions</p> <p>The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately.</p
Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts
Motivation: The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets
The value of position-specific priors in motif discovery using MEME
<p>Abstract</p> <p>Background</p> <p>Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM).</p> <p>Results</p> <p>We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior.</p> <p>Conclusions</p> <p>We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.</p
Novel Data Fusion Method and Exploration of Multiple Information Sources for Transcription Factor Target Gene Prediction
Background. Revealing protein-DNA interactions is a key problem in understanding transcriptional regulation at mechanistic level. Computational methods have an important role in predicting transcription factor target gene genomewide. Multiple data fusion provides a natural way to improve transcription factor target gene predictions because sequence specificities alone are not sufficient to accurately predict transcription factor binding sites. Methods. Here we develop a new data fusion method to combine multiple genome-level data sources and study the extent to which DNA duplex stability and nucleosome positioning information, either alone or in combination with other data sources, can improve the prediction of transcription factor target gene. Results. Results on a carefully constructed test set of verified binding sites in mouse genome demonstrate that our new multiple data fusion method can reduce false positive rates, and that DNA duplex stability and nucleosome occupation data can improve the accuracy of transcription factor target gene predictions, especially when combined with other genome-level data sources. Cross-validation and other randomization tests confirm the predictive performance of our method. Our results also show that nonredundant data sources provide the most efficient data fusion.Peer reviewe
Functional Characterization of Transcription Factor Motifs Using Cross-species Comparison across Large Evolutionary Distances
We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif–function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations
T-cell identity and epigenetic memory
T-cell development endows cells with a flexible range of effector differentiation options, superimposed on a stable core of lineage-specific gene expression that is maintained while access to alternative hematopoietic lineages is permanently renounced. This combination of features could be explained by environmentally responsive transcription factor mobilization overlaying an epigenetically stabilized base gene expression state. For example, "poising" of promoters could offer preferential access to T-cell genes, while repressive histone modifications and DNA methylation of non-T regulatory genes could be responsible for keeping non-T developmental options closed. Here, we critically review the evidence for the actual deployment of epigenetic marking to support the stable aspects of T-cell identity. Much of epigenetic marking is dynamically maintained or subject to rapid modification by local action of transcription factors. Repressive histone marks are used in gene-specific ways that do not fit a simple, developmental lineage-exclusion hierarchy. We argue that epigenetic analysis may achieve its greatest impact for illuminating regulatory biology when it is used to locate cis-regulatory elements by catching them in the act of mediating regulatory change
Punctuated evolution and transitional hybrid network in an ancestral cell cycle of fungi.
Although cell cycle control is an ancient, conserved, and essential process, some core animal and fungal cell cycle regulators share no more sequence identity than non-homologous proteins. Here, we show that evolution along the fungal lineage was punctuated by the early acquisition and entrainment of the SBF transcription factor through horizontal gene transfer. Cell cycle evolution in the fungal ancestor then proceeded through a hybrid network containing both SBF and its ancestral animal counterpart E2F, which is still maintained in many basal fungi. We hypothesize that a virally-derived SBF may have initially hijacked cell cycle control by activating transcription via the cis-regulatory elements targeted by the ancestral cell cycle regulator E2F, much like extant viral oncogenes. Consistent with this hypothesis, we show that SBF can regulate promoters with E2F binding sites in budding yeast