Search CORE

28 research outputs found

PReMod: a database of genome-wide mammalian cis-regulatory module predictions

Author: Bergeron Dominique
Blanchette Mathieu
Coulombe Benoit
Ferretti Vincent
Poitras Christian
Robert François
Publication venue: Oxford University Press
Publication date: 05/12/2006
Field of study

We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656–668, exploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites. Contrary to other existing databases, PReMod is not restricted to modules located proximal to genes, but in fact mostly contains distal predicted CRMs (pCRMs). Through its web interface, PReMod allows users to (i) identify pCRMs around a gene of interest; (ii) identify pCRMs that have binding sites for a given TF (or a set of TFs) or (iii) download the entire dataset for local analyses. Queries can also be refined by filtering for specific chromosomal regions, for specific regions relative to genes or for the presence of CpG islands. The output includes information about the binding sites predicted within the selected pCRMs, and a graphical display of their distribution within the pCRMs. It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the NCBI. PReMod:

Crossref

PubMed Central

A statistical fat-tail test of predicting regulatory regions in the Drosophila genome

Author: Li Yajing
Shu Jian-Jun
Publication venue: 'Elsevier BV'
Publication date: 07/03/2014
Field of study

A statistical study of cis-regulatory modules (CRMs) is presented based on the estimation of similar-word set distribution. It is observed that CRMs tend to have a fat-tail distribution. A new statistical fat-tail test with two kurtosis-based fatness coefficients is proposed to distinguish CRMs from non-CRMs. As compared with the existing fluffy-tail test, the first fatness coefficient is designed to reduce computational time, making the novel fat-tail test very suitable for long sequences and large database analysis in the post-genome time and the second one to improve separation accuracy between CRMs and non-CRMs. These two fatness coefficients may be served as valuable filtering indexes to predict CRMs experimentally

arXiv.org e-Print Archive

A statistical thin-tail test of predicting regulatory regions in the Drosophila genome

Author: Li Yajing
Shu Jian-Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: The identification of transcription factor binding sites (TFBSs) and cis-regulatory modules (CRMs) is a crucial step in studying gene expression, but the computational method attempting to distinguish CRMs from NCNRs still remains a challenging problem due to the limited knowledge of specific interactions involved. Methods: The statistical properties of cis-regulatory modules (CRMs) are explored by estimating the similar-word set distribution with overrepresentation (Z-score). It is observed that CRMs tend to have a thin-tail Z-score distribution. A new statistical thin-tail test with two thinness coefficients is proposed to distinguish CRMs from non-coding non-regulatory regions (NCNRs). Results: As compared with the existing fluffy-tail test, the first thinness coefficient is designed to reduce computational time, making the novel thin-tail test very suitable for long sequences and large database analysis in the post-genome time and the second one to improve the separation accuracy between CRMs and NCNRs. These two thinness coefficients may serve as valuable filtering indexes to predict CRMs experimentally. Conclusions: The novel thin-tail test provides an efficient and effective means for distinguishing CRMs from NCNRs based on the specific statistical properties of CRMs and can guide future experiments aimed at finding new CRMs in the post-genome time.Comment: arXiv admin note: substantial text overlap with arXiv:1402.533

arXiv.org e-Print Archive

Springer - Publisher Connector

Genome Biol.

Author: Brazma A.
Coulson R.
Manke T.
Palin K.
Sand O.
Ukkonen E.
van Helden J.
Vingron M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/01/2009
Field of study

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome

MPG.PuRe

An Iterative Learning Algorithm for Deciphering Stegoscripts: a Grammatical Approach for Motif Discovery

Author: Wang Guandong
Zhang Weixiong
Publication venue: Washington University Open Scholarship
Publication date: 15/04/2005
Field of study

Steganography, or information hiding, is to conceal the existence of messages so as to protect their conﬁdentiality. We consider de-ciphering a stegoscript, a text with secret messages embedded within a covertext, and identifying the vocabularies used in the mes-sages, with no knowledge of the vocabularies and grammar in which the script was writ-ten. Our research was motivated by the prob-lem of identifying conserved non-coding func-tional elements (motifs) in regulatory regions of genome sequences, which we view as stego-scripts constructed by nature with a statis-tical model consisting of a dictionary and a grammar. We develop an iterative learning algorithm, WordSpy, to learn such a model from a stegoscript. The model then can be applied to identify the embedded secret mes-sages, i.e., the functional motifs. Our algo-rithm can successfully recover the most pos-sible text of the ﬁrst ten chapters of a novel embedded in a stegoscript and identify the transcription factor binding motifs in the up-stream regions of ∼ 800 yeast genes

Washington University St. Louis: Open Scholarship

Fine-Tuning Enhancer Models to Predict Transcriptional Targets across Multiple Genomes

Author: A Ochoa-Espinosa
A Siepel
A Stark
AA Philippakis
AM Moses
AP Lifanov
B Adryan
BA Hassan
Bassem A. Hassan
BY Chan
CM Frith
D Karolchik
D Karolchik
DC King
DM Schroeder
E Emberly
E Segal
EH Davidson
G Thijs
Guillaume Bourque
GZ Hertz
IE Boyle
J van Helden
Jacques van Helden
JE Ostrin
JM Stuart
LW Chang
M Blanchette
M Brudno
M Markstein
M Pritsker
M Rebeiz
M Tompa
MC Bergman
MS Halfon
N Rajewsky
NV Taverner
O Johansson
Olivier Sand
PB Berman
PI zur Lage
R Siddharthan
S Aerts
S Aerts
S Kurtz
S Sinha
SB Montgomery
SM Gallo
SR Eddy
Stein Aerts
T Zhang
TL Bailey
WJ Kent
WW Wasserman
Y Sun
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Networks of regulatory relations between transcription factors (TF) and their target genes (TG)- implemented through TF binding sites (TFBS)- are key features of biology. An idealized approach to solving such networks consists of starting from a consensus TFBS or a position weight matrix (PWM) to generate a high accuracy list of candidate TGs for biological validation. Developing and evaluating such approaches remains a formidable challenge in regulatory bioinformatics. We perform a benchmark study on 34 Drosophila TFs to assess existing TFBS and cis-regulatory module (CRM) detection methods, with a strong focus on the use of multiple genomes. Particularly, for CRM-modelling we investigate the addition of orthologous sites to a known PWM to construct phyloPWMs and we assess the added value of phylogenentic footprinting to predict contextual motifs around known TFBSs. For CRM-prediction, we compare motif conservation with network-level conservation approaches across multiple genomes. Choosing the optimal training and scoring strategies strongly enhances the performance of TG prediction for more than half of the tested TFs. Finally, we analyse a 35th TF, namely Eyeless, and find a significant overlap between predicted TGs and candidate TGs identified by microarray expression studies. In summary we identify several ways to optimize TF-specific TG predictions, some of which can be applied to all TFs, and others that can be applied only to particular TFs. The ability to model known TF-TG relations, together with the use of multiple genomes, results in a significant step forward in solving the architecture of gene regulatory networks

Lirias

Crossref

HAL AMU

Directory of Open Access Journals

PubMed Central

DI-fusion

Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors

Author: Cheng Chao
Gerstein Mark B
Shou Chong
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We propose a method to predict yeast transcription factor targets by integrating histone modification profiles with transcription factor binding motif information. It shows improved predictive power compared to a binding motif-only method. We find that transcription factors cluster into histone-sensitive and -insensitive classes. The target genes of histone-sensitive transcription factors have stronger histone modification signals than those of histone-insensitive ones. The two classes also differ in tendency to interact with histone modifiers, degree of connectivity in protein-protein interaction networks, position in the transcriptional regulation hierarchy, and in a number of additional features, indicating possible differences in their transcriptional regulation mechanisms

Crossref

Springer - Publisher Connector

PubMed Central

MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts

Author: A Krogh
AN Tegge
C Notredame
CB Do
DF Feng
DG Higgins
DG Higgins
DG Higgins
DG Higgins
F Jeanmougin
F Wilcoxon
G Pollastri
GH Gonnet
GJ Barton
GP Raghava
GP Raghava
HY Zhou
J Cheng
J Heringa
J Pei
J Pei
J Pei
J Söding
J Söding
JD Thompson
JD Thompson
JD Thompson
JD Thompson
Jianlin Cheng
K Katoh
M Brudno
M Larkin
NK Kim
NS Boutonnet
O Poirot
O Poirot
PHA Sneath
R Chenna
R Durbin
RC Edgar
RC Edgar
RK Bradley
RS Amarendran
RS Amarendran
RS Amarendran
S Chikkagoudar
SE Brenner
SH Sze
T Kawabata
TL Bailey
U Roshan
V Walle
V Walle
Xin Deng
YC Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Multiple Sequence Alignment (MSA) is a basic tool for bioinformatics research and analysis. It has been used essentially in almost all bioinformatics tasks such as protein structure modeling, gene and protein function prediction, DNA motif recognition, and phylogenetic analysis. Therefore, improving the accuracy of multiple sequence alignment is important for advancing many bioinformatics fields. Results We designed and developed a new method, MSACompro, to synergistically incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. The method is different from the multiple sequence alignment methods (e.g. 3D-Coffee) that use the tertiary structure information of some sequences since the structural information of our method is fully predicted from sequences. To the best of our knowledge, applying predicted relative solvent accessibility and contact map to multiple sequence alignment is novel. The rigorous benchmarking of our method to the standard benchmarks (i.e. BAliBASE, SABmark and OXBENCH) clearly demonstrated that incorporating predicted protein structural information improves the multiple sequence alignment accuracy over the leading multiple protein sequence alignment tools without using this information, such as MSAProbs, ProbCons, Probalign, T-coffee, MAFFT and MUSCLE. And the performance of the method is comparable to the state-of-the-art method PROMALS of using structural features and additional homologous sequences by slightly lower scores. Conclusion MSACompro is an efficient and reliable multiple protein sequence alignment tool that can effectively incorporate predicted protein structural information into multiple sequence alignment. The software is available at <url>http://sysbio.rnet.missouri.edu/multicom_toolbox/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central