Search CORE

1,671 research outputs found

Predicting Combinatorial Binding of Transcription Factors to Regulatory Elements in the Human Genome by Association Rule Mining

Author: Iyer Vishwanath R.
Miranker Daniel P.
Morgan Xochitl C.
Ni Sshulin
Publication venue
Publication date: 01/01/2007
Field of study

Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cisregulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment. Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature. Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.NIAAA Alcohol Training GrantNational Science FoundationCellular and Molecular Biolog

Crossref

PubMed Central

Texas ScholarWorks

Recommended from our members

Eukaryotic transcriptional regulation : from data mining to transcriptional profiling

Author: Morgan Xochitl Chamorro
Publication venue
Publication date: 01/12/2008
Field of study

textSurvival of cells and organisms requires that each of thousands of genes is expressed at the correct time in development, in the correct tissue, and under the correct conditions. Transcription is the primary point of gene regulation. Genes are activated and repressed by transcription factors, which are proteins that become active through signaling, bind, sometimes cooperatively, to regulatory regions of DNA, and interact with other proteins such as chromatin remodelers. Yeast has nearly six thousand genes, several hundred of which are transcription factors; transcription factors comprise around 2000 of the 22,000 genes in the human genome. When and how these transcription factors are activated, as well as which subsets of genes they regulate, is a current, active area of research essential to understanding the transcriptional regulatory programs of organisms. We approached this problem in two divergent ways: first, an in silico study of human transcription factor combinations, and second, an experimental study of the transcriptional response of yeast mutants deficient in DNA repair. First, in order to better understand the combinatorial nature of transcription factor binding, we developed a data mining approach to assess whether transcription factors whose binding motifs were frequently proximal in the human genome were more likely to interact. We found many instances in the literature in which over-represented transcription factor pairs co-regulated the same gene, so we used co-citation to assess the utility of this method on a larger scale. We determined that over-represented pairs were more likely to be co-cited than would be expected by chance. Because proper repair of DNA is an essential and highly-conserved process in all eukaryotes, we next used cDNA microarrays to measure differentially expressed genes in eighteen yeast deletion strains with sensitivity to the DNA cross-linking agent methyl methane sulfonate (MMS); many of these mutants were transcription factors or DNA-binding proteins. Combining this data with tools such as chromatin immunoprecipitation, gene ontology analysis, expression profile similarity, and motif analysis allowed us to propose a model for the roles of Iki3 and of YML081W, a poorly-characterized gene, in DNA repair.Institute for Cellular and Molecular Biolog

Texas ScholarWorks

Modular combinatorial binding among human trans-acting factors reveals direct and indirect factor binding

Author: Gifford David K
Guo Yuchun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2017
Field of study

Background The combinatorial binding of trans-acting factors (TFs) to the DNA is critical to the spatial and temporal specificity of gene regulation. For certain regulatory regions, more than one regulatory module (set of TFs that bind together) are combined to achieve context-specific gene regulation. However, previous approaches are limited to either pairwise TF co-association analysis or assuming that only one module is used in each regulatory region. Results We present a new computational approach that models the modular organization of TF combinatorial binding. Our method learns compact and coherent regulatory modules from in vivo binding data using a topic model. We found that the binding of 115 TFs in K562 cells can be organized into 49 interpretable modules. Furthermore, we found that tens of thousands of regulatory regions use multiple modules, a structure that cannot be observed with previous hard clustering based methods. The modules discovered recapitulate many published protein-protein physical interactions, have consistent functional annotations of chromatin states, and uncover context specific co-binding such as gene proximal binding of NFY + FOS + SP and distal binding of NFY + FOS + USF. For certain TFs, the co-binding partners of direct binding (motif present) differs from those of indirect binding (motif absent); the distinct set of co-binding partners can predict whether the TF binds directly or indirectly with up to 95% accuracy. Joint analysis across two cell types reveals both cell-type-specific and shared regulatory modules. Conclusions Our results provide comprehensive cell-type-specific combinatorial binding maps and suggest a modular organization of combinatorial binding. Keywords Computational genomics Transcription factor Combinatorial binding Direct and indirect binding Topic modelNational Institutes of Health (U.S.) (grant 1U01HG007037-01

DSpace@MIT

Springer - Publisher Connector

PubMed Central

Springer OAI

Associative Pattern Recognition for Biological Regulation Data

Author: Xiao Yiou
Publication venue: SURFACE at Syracuse University
Publication date: 22/12/2017
Field of study

In the last decade, bioinformatics data has been accumulated at an unprecedented rate, thanks to the advancement in sequencing technologies. Such rapid development poses both challenges and promising research topics. In this dissertation, we propose a series of associative pattern recognition algorithms in biological regulation studies. In particular, we emphasize efficiently recognizing associative patterns between genes, transcription factors, histone modifications and functional labels using heterogeneous data sources (numeric, sequences, time series data and textual labels). In protein-DNA associative pattern recognition, we introduce an efficient algorithm for affinity test by searching for over-represented DNA sequences using a hash function and modulo addition calculation. This substantially improves the efficiency of \textit{next generation sequencing} data analysis. In gene regulatory network inference, we propose a framework for refining weak networks based on transcription factor binding sites, thus improved the precision of predicted edges by up to 52%. In histone modification code analysis, we propose an approach to genome-wide combinatorial pattern recognition for histone code to function associative pattern recognition, and achieved improvement by up to

38.1\%

. We also propose a novel shape based modification pattern analysis approach, using this to successfully predict sub-classes of genes in flowering-time category. We also propose a combination to combination associative pattern recognition, and achieved better performance compared against multi-label classification and bidirectional associative memory methods. Our proposed approaches recognize associative patterns from different types of data efficiently, and provides a useful toolbox for biological regulation analysis. This dissertation presents a road-map to associative patterns recognition at genome wide level

Syracuse University Research Facility and Collaborative Environment

Simplified Method to Predict Mutual Interactions of Human Transcription Factors Based on Their Primary Structure

Author: A Ben Hur
A Ceol
A Ramani
A Remenyi
A van Dijk
A Varshavsky
B Aranda
B Breitkreutz
B Lemon
Boris Jankovic
C Camacho
C Chen
D Caffrey
D GuhaThakurta
E Wingender
F Browne
GJ McLachlan
H Almuallim
I Donaldson
I Guyon
J Bock
J Capra
J Espadaler
J Hoskins
J Shen
J Wang
JJ Chung
JM Vaquerizas
Joaquín Dopazo
L Matthews
L Yu
M Guharoy
M Guharoy
M Kato
M McDowall
N Banerjee
P Aloy
P Aloy
R Hoffmann
R Jansen
S Hannenhalli
S Kawashima
S Lee
S Lo
S Orchard
S Pitre
S Teichmann
Sebastian Schmeier
T Dandekar
T Lee
T Ravasi
U Ogmen
V Matys
VB Bajić
Vladimir B. Bajic
W Kim
W Valdar
X Chen
X Li
X Wu
X Yu
X Yu
Y Guo
Z Hu
Z Zhu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Background: Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation. Methodology: We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39 % on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems. Conclusions: The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data

Author: Aguilar
Balaji
Banerjee
Bean
Beer
Bluthgen
Bonneau
Borneman
Bowers
Breitkreutz
Buchler
Chang
Chang
Chen
Das
Datta
Delcher
Demsar
Ercan
Friedman
Friedman
Friedman
Ge
Hannenhalli
Harbison
Hermsen
Ho
Hobert
Horak
Iyer
Jansen
Kanehisa
Knijnenburg
Krek
Kumar
Latchman
Latchman
Lee
Lee
Lee
Lee
Li
Li
Lotito
Lu
Luscombe
Mani
Mewes
Monteiro
Nagamine
Nariai
Nguyen
Parisi
Pilpel
Stark
Tan
Teixeira
Troyanskaya
Tsai
Tsong
Tsong
Wagner
Walther
Wang
Wang
Wang
Wang
Workman
Wu
Xiang-Sun Zhang
Yang
Yong Wang
Yu
Yu
Yu
Yu
Yu Xia
Zhu
Publication venue: Oxford University Press
Publication date: 01/10/2009
Field of study

Transcriptional cooperativity among several transcription factors (TFs) is believed to be the main mechanism of complexity and precision in transcriptional regulatory programs. Here, we present a Bayesian network framework to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. We design a Bayesian network structure to capture the dominant correlations among features and TF cooperativity, and introduce a supervised learning framework with a well-constructed gold-standard dataset. This framework allows us to assess the predictive power of each genomic feature, validate the superior performance of our Bayesian network compared to alternative methods, and integrate genomic features for optimal TF cooperativity prediction. Data integration reveals 159 high-confidence predicted cooperative relationships among 105 TFs, most of which are subsequently validated by literature search. The existing and predicted transcriptional cooperativities can be grouped into three categories based on the combination patterns of the genomic features, providing further biological insights into the different types of TF cooperativity. Our methodology is the first supervised learning approach for predicting transcriptional cooperativity, compares favorably to alternative unsupervised methodologies, and can be applied to other genomic data integration tasks where high-quality gold-standard positive data are scarce

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data

Author: Aguilar
Balaji
Banerjee
Bean
Beer
Bluthgen
Bonneau
Borneman
Bowers
Breitkreutz
Buchler
Chang
Chang
Chen
Das
Datta
Delcher
Demsar
Ercan
Friedman
Friedman
Friedman
Ge
Hannenhalli
Harbison
Hermsen
Ho
Hobert
Horak
Iyer
Jansen
Kanehisa
Knijnenburg
Krek
Kumar
Latchman
Latchman
Lee
Lee
Lee
Lee
Li
Li
Lotito
Lu
Luscombe
Mani
Mewes
Monteiro
Nagamine
Nariai
Nguyen
Parisi
Pilpel
Stark
Tan
Teixeira
Troyanskaya
Tsai
Tsong
Tsong
Wagner
Walther
Wang
Wang
Wang
Wang
Workman
Wu
Xiang-Sun Zhang
Yang
Yong Wang
Yu
Yu
Yu
Yu
Yu Xia
Zhu
Publication venue: Oxford University Press
Publication date: 01/10/2009
Field of study

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

Hematopoietic gene promoters subjected to a group-combinatorial study of DNA samples: identification of a megakaryocytic selective DNA signature

Author: Azorsa
Bailey
Bartel
Beer
Brazma
Buisine
Cynthia St. Hilaire
Deveaux
Eidhammer
Fernandes
Friedman
Hazony
Hazony
Heinemeyer
Hoch
Holmes
In Orengo
Jun Lu
Kaluzhny
Katya Ravid
Lepage
Lesk
Lo
Lu
Merika
Mignotte
Orengo
Plaza
Ravanat
Ravanat
Ravid
Shivdasani
Thompson
Wang
Wasylyk
Xie
Yehonathan Hazony
Publication venue: Oxford University Press
Publication date: 26/08/2006
Field of study

Identification of common sub-sequences for a group of functionally related DNA sequences can shed light on the role of such elements in cell-specific gene expression. In the megakaryocytic lineage, no one single unique transcription factor was described as linage specific, raising the possibility that a cluster of gene promoter sequences presents a unique signature. Here, the megakaryocytic gene promoter group, which consists of both human and mouse 5′ non-coding regions, served as a case study. A methodology for group-combinatorial search has been implemented as a customized software platform. It extracts the longest common sequences for a group of related DNA sequences and allows for single gaps of varying length, as well as double- and multiple-gap sequences. The results point to common DNA sequences in a group of genes that is selectively expressed in megakaryocytes, and which does not appear in a large group of control, random and specific sequences. This suggests a role for a combination of these sequences in cell-specific gene expression in the megakaryocytic lineage. The data also point to an intrinsic cross-species difference in the organization of 5′ non-coding sequences within the mammalian genomes. This methodology may be used for the identification of regulatory sequences in other lineages

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

Hematopoietic gene promoters subjected to a group-combinatorial study of DNA samples: identification of a megakaryocytic selective DNA signature

Author: Azorsa
Bailey
Bartel
Beer
Brazma
Buisine
Cynthia St. Hilaire
Deveaux
Eidhammer
Fernandes
Friedman
Hazony
Hazony
Heinemeyer
Hoch
Holmes
In Orengo
Jun Lu
Kaluzhny
Katya Ravid
Lepage
Lesk
Lo
Lu
Merika
Mignotte
Orengo
Plaza
Ravanat
Ravanat
Ravid
Shivdasani
Thompson
Wang
Wasylyk
Xie
Yehonathan Hazony
Publication venue: Oxford University Press
Publication date: 26/08/2006
Field of study

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central