12,109 research outputs found
Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining
Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog
From data towards knowledge: Revealing the architecture of signaling systems by unifying knowledge mining and data mining of systematic perturbation data
Genetic and pharmacological perturbation experiments, such as deleting a gene
and monitoring gene expression responses, are powerful tools for studying
cellular signal transduction pathways. However, it remains a challenge to
automatically derive knowledge of a cellular signaling system at a conceptual
level from systematic perturbation-response data. In this study, we explored a
framework that unifies knowledge mining and data mining approaches towards the
goal. The framework consists of the following automated processes: 1) applying
an ontology-driven knowledge mining approach to identify functional modules
among the genes responding to a perturbation in order to reveal potential
signals affected by the perturbation; 2) applying a graph-based data mining
approach to search for perturbations that affect a common signal with respect
to a functional module, and 3) revealing the architecture of a signaling system
organize signaling units into a hierarchy based on their relationships.
Applying this framework to a compendium of yeast perturbation-response data, we
have successfully recovered many well-known signal transduction pathways; in
addition, our analysis have led to many hypotheses regarding the yeast signal
transduction system; finally, our analysis automatically organized perturbed
genes as a graph reflecting the architect of the yeast signaling system.
Importantly, this framework transformed molecular findings from a gene level to
a conceptual level, which readily can be translated into computable knowledge
in the form of rules regarding the yeast signaling system, such as "if genes
involved in MAPK signaling are perturbed, genes involved in pheromone responses
will be differentially expressed"
A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks
Background
Transcription regulatory networks are composed of interactions between transcription factors and their target genes. Whereas unicellular networks have been studied extensively, metazoan transcription regulatory networks remain largely unexplored. Caenorhabditis elegans provides a powerful model to study such metazoan networks because its genome is completely sequenced and many functional genomic tools are available. While C. elegans gene predictions have undergone continuous refinement, this is not true for the annotation of functional transcription factors. The comprehensive identification of transcription factors is essential for the systematic mapping of transcription regulatory networks because it enables the creation of physical transcription factor resources that can be used in assays to map interactions between transcription factors and their target genes.
Results
By computational searches and extensive manual curation, we have identified a compendium of 934 transcription factor genes (referred to as wTF2.0). We find that manual curation drastically reduces the number of both false positive and false negative transcription factor predictions. We discuss how transcription factor splice variants and dimer formation may affect the total number of functional transcription factors. In contrast to mouse transcription factor genes, we find that C. elegans transcription factor genes do not undergo significantly more splicing than other genes. This difference may contribute to differences in organism complexity. We identify candidate redundant worm transcription factor genes and orthologous worm and human transcription factor pairs. Finally, we discuss how wTF2.0 can be used together with physical transcription factor clone resources to facilitate the systematic mapping of C. elegans transcription regulatory networks.
Conclusion
wTF2.0 provides a starting point to decipher the transcription regulatory networks that control metazoan development and function
Application of regulatory sequence analysis and metabolic network analysis to the interpretation of gene expression data
We present two complementary approaches for the interpretation of clusters of
co-regulated genes, such as those obtained from DNA chips and related methods.
Starting from a cluster of genes with similar expression profiles, two basic
questions can be asked:
1. Which mechanism is responsible for the coordinated transcriptional response
of the genes? This question is approached by extracting motifs that are shared
between the upstream sequences of these genes. The motifs extracted are putative
cis-acting regulatory elements.
2. What is the physiological meaning for the cell to express together these
genes? One way to answer the question is to search for potential metabolic
pathways that could be catalyzed by the products of the genes. This can be
done by selecting the genes from the cluster that code for enzymes, and trying
to assemble the catalyzed reactions to form metabolic pathways.
We present tools to answer these two questions, and we illustrate their use with
selected examples in the yeast Saccharomyces cerevisiae. The tools are available
on the web (http://ucmb.ulb.ac.be/bioinformatics/rsa-tools/;
http://www.ebi.ac.uk/research/pfbp/; http://www.soi.city.ac.uk/~msch/)
Computational identification of transcription factor binding sites by functional analysis of sets of genes sharing overrepresented upstream motifs
BACKGROUND: Transcriptional regulation is a key mechanism in the functioning
of the cell, and is mostly effected through transcription factors binding to
specific recognition motifs located upstream of the coding region of the
regulated gene. The computational identification of such motifs is made easier
by the fact that they often appear several times in the upstream region of the
regulated genes, so that the number of occurrences of relevant motifs is often
significantly larger than expected by pure chance. RESULTS: To exploit this
fact, we construct sets of genes characterized by the statistical
overrepresentation of a certain motif in their upstream regions. Then we study
the functional characterization of these sets by analyzing their annotation to
Gene Ontology terms. For the sets showing a statistically significant specific
functional characterization, we conjecture that the upstream motif
characterizing the set is a binding site for a transcription factor involved in
the regulation of the genes in the set. CONCLUSIONS: The method we propose is
able to identify many known binding sites in S. cerevisiae and new candidate
targets of regulation by known transcription factors. Its application to less
well studied organisms is likely to be valuable in the exploration of their
regulatory interaction network.Comment: 19 pages, 1 figure. Published version with several improvements.
Supplementary material available from the author
Recommended from our members
Eukaryotic transcriptional regulation : from data mining to transcriptional profiling
textSurvival of cells and organisms requires that each of thousands of genes is expressed at the correct time in development, in the correct tissue, and under the correct conditions. Transcription is the primary point of gene regulation. Genes are activated and repressed by transcription factors, which are proteins that become active through signaling, bind, sometimes cooperatively, to regulatory regions of DNA, and interact with other proteins such as chromatin remodelers. Yeast has nearly six thousand genes, several hundred of which are transcription factors; transcription factors comprise around 2000 of the 22,000 genes in the human genome. When and how these transcription factors are activated, as well as which subsets of genes they regulate, is a current, active area of research essential to understanding the transcriptional regulatory programs of organisms. We approached this problem in two divergent ways: first, an in silico study of human transcription factor combinations, and second, an experimental study of the transcriptional response of yeast mutants deficient in DNA repair. First, in order to better understand the combinatorial nature of transcription factor binding, we developed a data mining approach to assess whether transcription factors whose binding motifs were frequently proximal in the human genome were more likely to interact. We found many instances in the literature in which over-represented transcription factor pairs co-regulated the same gene, so we used co-citation to assess the utility of this method on a larger scale. We determined that over-represented pairs were more likely to be co-cited than would be expected by chance. Because proper repair of DNA is an essential and highly-conserved process in all eukaryotes, we next used cDNA microarrays to measure differentially expressed genes in eighteen yeast deletion strains with sensitivity to the DNA cross-linking agent methyl methane sulfonate (MMS); many of these mutants were transcription factors or DNA-binding proteins. Combining this data with tools such as chromatin immunoprecipitation, gene ontology analysis, expression profile similarity, and motif analysis allowed us to propose a model for the roles of Iki3 and of YML081W, a poorly-characterized gene, in DNA repair.Institute for Cellular and Molecular Biolog
- âŠ