Search CORE

16 research outputs found

GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments

Author: Carlson
Clarke
Gert Jan C. Veenstra
Hu
Jin
Kouwenhoven
Park
Sandelin
Schneider
Simon J. van Heeringen
Tompa
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Summary: Accurate prediction of transcription factor binding motifs that are enriched in a collection of sequences remains a computational challenge. Here we report on GimmeMotifs, a pipeline that incorporates an ensemble of computational tools to predict motifs de novo from ChIP-sequencing (ChIP-seq) data. Similar redundant motifs are compared using the weighted information content (WIC) similarity score and clustered using an iterative procedure. A comprehensive output report is generated with several different evaluation metrics to compare and evaluate the results. Benchmarks show that the method performs well on human and mouse ChIP-seq datasets. GimmeMotifs consists of a suite of command-line scripts that can be easily implemented in a ChIP-seq analysis pipeline

Crossref

PubMed Central

Radboud Repository

Transcription Factor-DNA Binding Via Machine Learning Ensembles

Author: DeLisi Charles
Fan Yue
Kon Mark
Publication venue
Publication date: 09/05/2018
Field of study

We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Practical Strategies for Discovering Regulatory DNA Sequence Motifs

Author: Fraenkel Ernest
MacIsaac Kenzie D
Publication venue: Public Library of Science
Publication date: 01/04/2006
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Explicit equilibrium modeling of transcription-factor binding and gene regulation

Author: Clarke Neil D
Granek Joshua A
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

We have developed a computational model that predicts the probability of transcription factor binding to any site in the genome. GOMER (generalizable occupancy model of expression regulation) calculates binding probabilities on the basis of position weight matrices, and incorporates the effects of cooperativity and competition by explicit calculation of coupled binding equilibria. GOMER can be used to test hypotheses regarding gene regulation that build upon this physically principled prediction of protein-DNA binding

Springer - Publisher Connector

PubMed Central

ScholarBank@NUS

Characterization of genome-wide p53-binding sites upon stress response

Author: Al-Shahrour
Ashburner
Ashcroft
Avantaggiati
Aylon
Bensaad
Bensaad
Bode
Brooks
Budhram-Mahadeo
Cawley
Ceribelli
Clarke
Contente
Coutts
Das
Demonacos
Denissov
el-Deiry
el-Deiry
Erster
Espinosa
Euskirchen
Flores
Flores
Gohler
Gordon
Grossmann
Gu
Harbison
Hearnes
Hendrik G. Stunnenberg
Ho
Hoh
Hollstein
Homer
Horn
Huang
Hubbard
Inga
Jen
Jett
Ji
Kanehisa
Kaneshiro
Kel
Kent
Kern
Kim
Kitayner
Koutsodontis
Koutsodontis
Krieg
Kubbutat
Laptenko
Leonie Smeenk
Li
Liu
Liu
Lokshin
Marc A. van Driel
Marion Lohrum
Matys
Max Koeppel
Perez
Ren
Robert C. Akkers
Robinson
Sbisa
Schumm
Sergei Denissov
Shikama
Simon J. van Heeringen
Stefanie J. J. Bartels
Sullivan
Tanaka
Thornborrow
Veprintsev
Vogelstein
Wei
Yang
Yoon
Zeng
Zhao
Zheng
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The tumor suppressor p53 is a sequence-specific transcription factor, which regulates the expression of target genes involved in different stress responses. To understand p53's essential transcriptional functions, unbiased analysis of its DNA-binding repertoire is pivotal. In a genome-wide tiling ChIP-on-chip approach, we have identified and characterized 1546 binding sites of p53 upon Actinomycin D treatment. Among those binding sites were known as well as novel p53 target sites, which included regulatory regions of potentially novel transcripts. Using this collection of genome-wide binding sites, a new high-confidence algorithm was developed, p53scan, to identify the p53 consensus-binding motif. Strikingly, this motif was present in the majority of all bound sequences with 83% of all binding sites containing the motif. In the surrounding sequences of the binding sites, several motifs for potential regulatory cobinders were identified. Finally, we show that the majority of the genome-wide p53 target sites can also be bound by overexpressed p63 and p73 in vivo, suggesting that they can possibly play an important role at p53 binding sites. This emphasizes the possible interplay of p53 and its family members in the context of target gene binding. Our study greatly expands the known, experimentally validated p53 binding site repertoire and serves as a valuable knowledgebase for future research

Crossref

PubMed Central

Radboud Repository

Molecular determinants of caste differentiation in the highly eusocial honeybee Apis mellifera

Author: Barchuk Angel R
Costa Luciano F
Cristino Alexandre S
Kucharski Robert
Maleszka Ryszard
Simões Zilá LP
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In honeybees, differential feeding of female larvae promotes the occurrence of two different phenotypes, a queen and a worker, from identical genotypes, through incremental alterations, which affect general growth, and character state alterations that result in the presence or absence of specific structures. Although previous studies revealed a link between incremental alterations and differential expression of physiometabolic genes, the molecular changes accompanying character state alterations remain unknown. Results By using cDNA microarray analyses of >6,000 <it>Apis mellifera </it>ESTs, we found 240 differentially expressed genes (DEGs) between developing queens and workers. Many genes recorded as up-regulated in prospective workers appear to be unique to <it>A. mellifera</it>, suggesting that the workers' developmental pathway involves the participation of novel genes. Workers up-regulate more developmental genes than queens, whereas queens up-regulate a greater proportion of physiometabolic genes, including genes coding for metabolic enzymes and genes whose products are known to regulate the rate of mass-transforming processes and the general growth of the organism (e.g., <it>tor</it>). Many DEGs are likely to be involved in processes favoring the development of caste-biased structures, like brain, legs and ovaries, as well as genes that code for cytoskeleton constituents. Treatment of developing worker larvae with juvenile hormone (JH) revealed 52 JH responsive genes, specifically during the critical period of caste development. Using Gibbs sampling and Expectation Maximization algorithms, we discovered eight overrepresented <it>cis</it>-elements from four gene groups. Graph theory and complex networks concepts were adopted to attain powerful graphical representations of the interrelation between <it>cis</it>-elements and genes and objectively quantify the degree of relationship between these entities. Conclusion We suggest that clusters of functionally related DEGs are co-regulated during caste development in honeybees. This network of interactions is activated by nutrition-driven stimuli in early larval stages. Our data are consistent with the hypothesis that JH is a key component of the developmental determination of queen-like characters. Finally, we propose a conceptual model of caste differentiation in <it>A. mellifera </it>based on gene-regulatory networks.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Canberra Research Repository

RCAAP - Repositório Científico de Acesso Aberto de Portugal

The Australian National University

Universidade de São Paulo

Transcription factor-DNA binding via machine learning ensembles

Author: Delisi Charles
Fan Yue
Kon Mark A.
Publication venue
Publication date: 27/05/2018
Field of study

The network of interactions between transcription factors (TFs) and their regulatory gene targets governs many of the behaviors and responses of cells. Construction of a transcriptional regulatory network involves three interrelated problems, defined for any regulator: finding (1) its target genes, (2) its binding motif and (3) its DNA binding sites. Many tools have been developed in the last decade to solve these problems. However, performance of algorithms for these has not been consistent for all transcription factors. Because machine learning algorithms have shown advantages in integrating information of different types, we investigate a machine-based approach to integrating predictions from an ensemble of commonly used motif exploration algorithms.Published versio

Boston University Institutional Repository (OpenBU)

The four hexamerin genes in the honey bee: structure, molecular evolution and function deduced from expression patterns in queens, workers and drones

Author: Bitondi Márcia MG
Cristino Alexandre S
Martins Juliana R
Nunes Francis MF
Simões Zilá LP
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Hexamerins are hemocyanin-derived proteins that have lost the ability to bind copper ions and transport oxygen; instead, they became storage proteins. The current study aimed to broaden our knowledge on the hexamerin genes found in the honey bee genome by exploring their structural characteristics, expression profiles, evolution, and functions in the life cycle of workers, drones and queens. Results: The hexamerin genes of the honey bee (hex 70a, hex 70b, hex 70c and hex 110) diverge considerably in structure, so that the overall amino acid identity shared among their deduced protein subunits varies from 30 to 42%. Bioinformatics search for motifs in the respective upstream control regions (UCRs) revealed six overrepresented motifs including a potential binding site for Ultraspiracle (Usp), a target of juvenile hormone (JH). The expression of these genes was induced by topical application of JH on worker larvae. The four genes are highly transcribed by the larval fat body, although with significant differences in transcript levels, but only hex 110 and hex 70a are re-induced in the adult fat body in a caste-and sex-specific fashion, workers showing the highest expression. Transcripts for hex 110, hex 70a and hex70b were detected in developing ovaries and testes, and hex 110 was highly transcribed in the ovaries of egg-laying queens. A phylogenetic analysis revealed that HEX 110 is located at the most basal position among the holometabola hexamerins, and like HEX 70a and HEX 70c, it shares potential orthology relationship with hexamerins from other hymenopteran species. Conclusions: Striking differences were found in the structure and developmental expression of the four hexamerin genes in the honey bee. The presence of a potential binding site for Usp in the respective 5' UCRs, and the results of experiments on JH level manipulation in vivo support the hypothesis of regulation by JH. Transcript levels and patterns in the fat body and gonads suggest that, in addition to their primary role in supplying amino acids for metamorphosis, hexamerins serve as storage proteins for gonad development, egg production, and to support foraging activity. A phylogenetic analysis including the four deduced hexamerins and related proteins revealed a complex pattern of evolution, with independent radiation in insect orders.Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP)[05/03926-5; 08/00541-3

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

Transcription factor motif quality assessment requires systematic comparative analysis

Author: Kibet Cabel K
Machanick Philip
Publication venue
Publication date: 01/01/2016
Field of study

Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis

Rhodes Repository (SEALS)

Transcription factor motif quality assessment requires systematic comparative analysis [version 2; referees: 2 approved]

Author: A Jolma
A Kubosaki
A Mathelier
A Mathelier
A Medina-Rivera
A Quinlan
B Contreras-Moreira
B Foat
C Grant
C Harbison
C Kibet
D Johnson
D Newburger
D Quest
E Feingold
E Wilbanks
F Mordelet
F Zambelli
G Badis
G Sandve
G Sandve
H Rhee
H Touzet
I Kulakovskiy
J Granek
J Hu
J Keilwagen
J Wang
K Klepper
K Lower
K Takahashi
L Wang
M Annala
M Bengtsen
M Guertin
M Pachkov
M Pujato
M Slattery
M Thomas-Chollier
M Tompa
M Weirauch
M Weirauch
N Clarke
P Agius
P Kheradpour
P Machanick
R Siddharthan
S Heinz
S van Heeringen
S Zhong
T Bailey
T Bailey
T Bailey
T Bailey
T Lesluyes
T Schneider
V Jin
X Chen
X Chen
Y Orenstein
Y Orenstein
Y Orenstein
Y Zhang
Y Zhao
Y Zhao
Z Zhang
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2016
Field of study

Crossref

Directory of Open Access Journals

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)