Search CORE

24 research outputs found

Beyond microarrays: Finding key transcription factors controlling signal transduction pathways

Author: A Kel
AE Kel
AE Kel
Alexdander Kel
CD Schmid
D Viemann
E Birney
E Cheremushkin
Edgar Wingender
GG Loots
GG Loots
K Frech
M Krull
Nico Voss
O Kel-Margoulis
Olga Kel-Margoulis
R Hardison
R Yamashita
Ruy Jauregui
S Sinha
V Matys
WW Wasserman
X Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Massive gene expression changes in different cellular states measured by microarrays, in fact, reflect just an "echo" of real molecular processes in the cells. Transcription factors constitute a class of the regulatory molecules that typically require posttranscriptional modifications or ligand binding in order to exert their function. Therefore, such important functional changes of transcription factors are not directly visible in the microarray experiments. RESULTS: We developed a novel approach to find key transcription factors that may explain concerted expression changes of specific components of the signal transduction network. The approach aims at revealing evidence of positive feedback loops in the signal transduction circuits through activation of pathway-specific transcription factors. We demonstrate that promoters of genes encoding components of many known signal transduction pathways are enriched by binding sites of those transcription factors that are endpoints of the considered pathways. Application of the approach to the microarray gene expression data on TNF-alpha stimulated primary human endothelial cells helped to reveal novel key transcription factors potentially involved in the regulation of the signal transduction pathways of the cells. CONCLUSION: We developed a novel computational approach for revealing key transcription factors by knowledge-based analysis of gene expression data with the help of databases on gene regulatory networks (TRANSFAC(® )and TRANSPATH(®)). The corresponding software and databases are available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TRANSFAC(®) and its module TRANSCompel(®): transcriptional gene regulation in eukaryotes

Author: Barre-Dirrie A.
Chekmenev D.
Fricke E.
Hornischer K.
Kel A. E.
Kel-Margoulis O. V.
Krull M.
Land S.
Lewicki-Potapov B.
Liebich I.
Matys V.
Reuter I.
Saxel H.
Stegmaier P.
Voss N.
Wingender E.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The TRANSFAC(®) database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel(®) on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of Match™ and Patch™ provides increased functionality for TRANSFAC(®). The list of databases which are linked to the common GENE table of TRANSFAC(®) and TRANSCompel(®) has been extended by: Ensembl, UniGene, EntrezGene, HumanPSD™ and TRANSPRO™. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel(®) contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC(®), in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC(®) 7.0 and TRANSCompel(®) 7.0, are accessible under

CiteSeerX

PubMed Central

Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions

Author: A Hoglund
AE Kel
AE Kel
AE Vinogradov
B Efron
B Jaruga
BJ Deroo
C Burge
CD Schmid
CR Calladine
D Cai
D GuhaThakurta
DM Graunke
E Fayard
Elena A Ananko
Elena V Ignatieva
FA Wright
GD Stormo
HP Ko
I Abnizova
I Ben-Gal
IA Udalova
Igor I Turnaev
J Duarte
J Hu
JV Ponomarenko
K Ellrott
K Morohashi
K Quandt
KJ Campbell
L Quintana-Murci
LC Platanias
LG Cowell
M Beato
M Blanchette
M Costantini
M Ganapathi
M Lohoff
M Stepanova
M-LT Lee
ML Bulyk
MP Ponomarenko
MQ Zhang
MQ Zhang
NA Kolchanov
NI Gershenzon
Nikolay A Kolchanov
NV Klimova
O Kel-Margoulis
OA Podkolodnaia
OD King
OG Berg
P Val
PV Benos
Q Zhou
R Castelo
R Kiyama
R Osada
R Pudimat
RV Davuluri
S Kamalakaran
Tatyana I Merkulova
TC Hodgman
TK Man
TM Chen
TV Busygina
VG Levitskii
VG Levitsky
VG Levitsky
VG Levitsky
VG Levitsky
Victor G Levitsky
VV Solovyev
W Huang
WH Shen
WW Wasserman
X Xie
Y Barash
Publication venue: BioMed Central
Publication date: 01/12/2007
Field of study

Abstract Background Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered. Results To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies. To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA. Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies. Conclusion Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Comparative analysis of cis-regulation following stroke and seizures in subspaces of conserved eigensystems

Crossref

Springer - Publisher Connector

PubMed Central

Assessing Computational Methods of Cis-Regulatory Module Prediction

Author: A Bruhat
A Siepel
A Sosinsky
A Visel
AB Rose
AG Clark
AL Halpern
AM Moses
B Prud'homme
B Shi
BK Peterson
BP Berman
BY Chan
Christina Leslie
CM Bergman
CM Bergman
D Kolbe
D Papatsenko
DA Kleinjan
DC King
DC King
DE Schones
DM Jeziorska
DS Johnson
E Birney
E Davidson
E Emberly
E Segal
E Wingender
G Bejerano
GM Euskirchen
H Wang
H Weintraub
JB Warner
Jing Su
JL Kabat
JR Stone
JS Jakobsen
KH Surinya
KJ Won
L Li
LP Lim
M Bieda
M Blanchette
M Brudno
M Hasegawa
MC Frith
MD Schroeder
MD Wilson
MS Halfon
MS Halfon
MZ Ludwig
N Bray
N Ghanem
N Gompel
N Pierstorff
ND Heintzman
ND Heintzman
O Hallikas
O Johansson
OV Kel-Margoulis
P Van Loo
PC FitzGerald
PJ Sabo
Q Zhou
Q Zhou
R Godbout
RP Zinzen
S Aerts
S Aerts
S Batzoglou
S Karlin
S MacArthur
S Richards
S Sinha
S Sinha
S Sinha
Sarah A. Teichmann
SC Parker
SE Celniker
T Sandmann
T Strachan
T Waleev
Thomas A. Down
TL Bailey
TM Williams
V Ferretti
V Gotea
W Krivan
WW Wasserman
X He
X He
XY Li
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Computational methods attempting to identify instances of cis-regulatory modules (CRMs) in the genome face a challenging problem of searching for potentially interacting transcription factor binding sites while knowledge of the specific interactions involved remains limited. Without a comprehensive comparison of their performance, the reliability and accuracy of these tools remains unclear. Faced with a large number of different tools that address this problem, we summarized and categorized them based on search strategy and input data requirements. Twelve representative methods were chosen and applied to predict CRMs from the Drosophila CRM database REDfly, and across the human ENCODE regions. Our results show that the optimal choice of method varies depending on species and composition of the sequences in question. When discriminating CRMs from non-coding regions, those methods considering evolutionary conservation have a stronger predictive power than methods designed to be run on a single genome. Different CRM representations and search strategies rely on different CRM properties, and different methods can complement one another. For example, some favour homotypical clusters of binding sites, while others perform best on short CRMs. Furthermore, most methods appear to be sensitive to the composition and structure of the genome to which they are applied. We analyze the principal features that distinguish the methods that performed well, identify weaknesses leading to poor performance, and provide a guide for users. We also propose key considerations for the development and evaluation of future CRM-prediction methods

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources

Author: A Ambesi-Impiombato
A Bernard
A Beyer
A Sandelin
A Sandelin
A Siepel
AFA Smit
Alistair G. Rust
B Ren
CE Lawrence
CL Warren
CP Robert
CT Harbison
D GuhaThakurta
D Husmeier
D Husmeier
David Jones
DB Gordon
DJ Reiss
DJ Wilkinson
DT Holloway
DT Holloway
E Blanco
E Segal
E Segal
E Wingender
EH Davidson
G Chen
G Thijs
G Thijs
GD Stormo
GE Crawford
H Huang
H Lähdesmäki
H Steck
Harri Lähdesmäki
Ilya Shmulevich
IV Bajić
J Taylor
JD Hughes
JM Claverie
K Quandt
K Thomas
KD MacIsaac
KP Murphy
L Hertzberg
L Narlikar
L Narlikar
L Narlikar
L Zhang
M Eisenstein
M Kellis
M Levine
M Tompa
MA Beer
MC Frith
MF Berger
MJL de Hoon
ML Bulyk
N Friedman
N Rajewsky
ND Heintzman
O Hallikas
OV Kel-Margoulis
Q Zhou
R Siddharthan
R Staden
S Cawley
S Mukherjee
S Sinha
S Sinha
SB Montgomery
SJ Maerkl
SP Brooks
ST Jensen
T Chen
T Fawcett
T Reguly
TD Wu
TI Lee
TL Bailey
TL Bailey
VD Marinescu
W Pan
WJ Kent
WP Lehrach
WW Wasserman
X Liu
X Xie
XS Liu
Y Barash
Y Barash
Y Qi
Y Tamada
Publication venue: Public Library of Science
Publication date: 01/03/2008
Field of study

An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation

Author: O. V. Kel-Margoulis
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Automatic annotation of genomic regulatory sequences by searching for composite clusters

Author: E. Wingender
O. V. Kel-margoulis
T. G. Ivanova
Publication venue
Publication date
Field of study

A new method was developed for revealing of composite clusters of cis-elements in promoters of eukaryotic genes that are functionally related or coexpressed. A software system “ClusterScan” have been created that enables: (i) to train system on representative samples of promoters to reveal cis-elements that tend to cluster; (ii) to train system on a number of samples of functionally related promoters to identify functionally coupled transcription factors; (iii) to provide tools for searching of this clusters in genomic sequences to identify and functionally characterize regulatory regions in genome. A number of training samples of different functional and structural groups of promoters were analysed. Search for composite clusters in human chromosomes 21 and 22 reveals a number of interesting examples. Finally, a decision tree system was constructed to classify promoters of several functionally related gene groups. The decision tree system enables to identify new promoters and computationally predict their possible function. 1

CiteSeerX

Computer-assisted identification of cell cycle-related genes: new targets for E2F transcription factors

Author: Bartley S. M.
Farnham P. J.
Kel A. E.
Kel-Margoulis O. V.
Wingender E.
Zhang M. Q.
Publication venue: 'Elsevier BV'
Publication date: 01/05/2001
Field of study

The processes that take place during development and differentiation are directed through coordinated regulation of expression of a large number of genes. One such gene regulatory network provides cell cycle control in eukaryotic organisms. Ln this work, we have studied the structural features of the 5' regulatory regions of cell cycle-related genes. We developed a new method for identifying composite substructures (modules) in regulatory regions of genes consisting of a binding site for a key transcription factor and additional contextual motifs: potential targets for other transcription factors that may synergistically regulate gene transcription. Applying this method to cell cycle-related promoters, we created a program for context-specific identification of binding sites for transcription factors of the E2F family which are key regulators of the cell cycle. We found that E2F composite modules are found at a high frequency and in close proximity to the start of transcription in cell cycle-related promoters in comparison with other promoters. Using this information, we then searched for E2F sites in genomic sequences with the goal of identifying new genes which play important roles in controlling cell proliferation, differentiation and apoptosis. Using a chromatin immunoprecipitation assay, we then experimentally verified the binding of E2F in vivo to the promoters predicted by the computer-assisted methods. Our identification of new E2F target genes provides new insight into gene regulatory networks and provides a framework for continued analysis of the role of contextual promoter features in transcriptional regulation. The tools described are available at http://compel.bionet.nsc.ru/FunSite/SiteScan.htm

Cold Spring Harbor Laboratory Institutional Repository