Search CORE

8,274 research outputs found

Position dependencies in transcription factor binding sites

Author: Oakeley Edward J.
Tomovic Andrija
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: Most of the available tools for transcription factor binding site prediction are based on methods which assume no sequence dependence between the binding site base positions. Our primary objective was to investigate the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and to use the resulting data to develop improved scoring functions for binding-site prediction. Results: Using three statistical tests, we analyzed the number of binding sites showing dependent positions. We analyzed transcription factor-DNA crystal structures for evidence of position dependence. Our final conclusions were that some factors show evidence of dependencies whereas others do not. We observed that the conformational energy (Z-score) of the transcription factor-DNA complexes was lower (better) for sequences that showed dependency than for those that did not (P < 0.02). We suggest that where evidence exists for dependencies, these should be modeled to improve binding-site predictions. However, when no significant dependency is found, this correction should be omitted. This may be done by converting any existing scoring function which assumes independence into a form which includes a dependency correction. We present an example of such an algorithm and its implementation as a web tool. Availability: http://promoterplot.fmi.ch/cgi-bin/dep.html Contact: [email protected] Supplementary information: Supplementary data (1, 2, 3, 4, 5, 6, 7 and 8) are available at Bioinformatics onlin

RERO DOC Digital Library

A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites

Author: Achaz
Bailey
Barash
Benjamini
Benos
Bertone
Brauer
Brian T. Naughton
Bulyk
Douglas L. Brutlag
Durbin
Eugene Fratkin
Fratkin
Garten
Gojobori
Gribskov
Gu
Harbison
Hughes
King
Lapidot
Liu
Matys
Pavesi
Sandelin
Schug
Segal
Serafim Batzoglou
Stolovicki
Stone
Storey
Stormo
Story
Thomas
Wang
Watts
Xing
Zhou
Publication venue: Oxford University Press
Publication date: 13/11/2006
Field of study

Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs

Crossref

PubMed Central

InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites

Author: Eggeling Ralf
Grau Jan
Grosse Ivo
Publication venue
Publication date: 15/02/2017
Field of study

Summary: Recent studies have shown that the traditional position weight matrix model is often insufficient for modeling transcription factor binding sites, as intra-motif dependencies play a significant role for an accurate description of binding motifs. Here, we present the Java application InMoDe, a collection of tools for learning, leveraging and visualizing such dependencies of putative higher order. The distinguishing feature of InMoDe is a robust model selection from a class of parsimonious models, taking into account dependencies only if justified by the data while choosing for simplicity otherwise. Availability and Implementation: InMoDe is implemented in Java and is available as command line application, as application with a graphical user-interface, and as an integration into Galaxy on the project website at http://www.jstacs.de/index.php/InMoDe.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Author: Arenillas David J
Chen Chih-Yu
Denay Grégoire
Fornes Oriol
Lee Jessica
Lenhard Boris
Mathelier Anthony
Parcy François
Sandelin Albin
Shi Wenqiang
Shyr Casper
Tan Ge
Wasserman Wyeth W.
Worsley-Hunt Rebecca
Zhang Allen W
Publication venue: 'Oxford University Press (OUP)'
Publication date: 04/01/2016
Field of study

International audienceJASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release

Hal - Université Grenoble Alpes

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Author: Arenillas David J
Chen Chih-Yu
Denay Grégoire
Fornes Oriol
Lee Jessica
Lenhard Boris
Mathelier Anthony
Parcy François
Sandelin Albin
Shi Wenqiang
Shyr Casper
Tan Ge
Wasserman Wyeth W.
Worsley-Hunt Rebecca
Zhang Allen W
Publication venue: 'Oxford University Press (OUP)'
Publication date: 03/11/2015
Field of study

Hal - Université Grenoble Alpes

Copenhagen University Research Information System

PubMed Central

Spiral - Imperial College Digital Repository

HAL-CEA

ProdInra

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Author: Arenillas DJ
Chen CY
Denay G
Fornes O
Lee J
Lenhard B
Mathelier A
Parcy F
Sandelin A
Shi W
Shyr C
Tan G
Wasserman WW
Worsley-Hunt R
Zhang AW
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2015
Field of study

JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release

PubMed Central

Copenhagen University Research Information System

Spiral - Imperial College Digital Repository

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Author: Abeel
Afflerbach
Angarica
Bailey
Bart Hooghe
Bauer
Benos
Breiman
Bulyk
Burden
Calladine
Camenisch
Chen
Cho
Cordell
Davis
Dickerson
Ehret
Ernst
Frans van Roy
Friedel
Fujii
Fulton
Gama-Castro
Gardiner
Gartenberg
Gershenzon
Goodsell
Gorin
Gowrisankar
Greenbaum
Gunewardena
Hall
Hendrickson
Hu
Juo
Kajimura
Kaplan
Karas
Kel
Kim
Lavery
Lewis
Liu
Liu
Liu
Long
Lu
Lu
Lu
Lunetta
Man
Marco
Marinescu
Martinez-Hackert
Matys
Medina-Rivera
Meysman
Michel
Mokry
Morozov
Narang
Naughton
O'Flanagan
Olson
Paillard
Pan
Parker
Parvin
Pieter De Bleser
Ponomarenko
Portales-Casamar
Powell
Pudimat
Ramsey
Rohs
Rohs
Rohs
Ruiz
Satchwell
Schneider
Shakked
Sharon
Shi
Spolar
Stefan Broos
Stormo
Svozil
Thayer
Tomovic
Toro-Roman
Travers
Tullius
Wunderlich
Zhang
Zhang
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

Crossref

Ghent University Academic Bibliography

PubMed Central

Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences.

Author: Siebert M.
Söding J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k - 1 act as priors for those of order k This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P = 1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26-101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs

Open Access LMU

PubMed Central

MPG.PuRe

Higher Order PWM for Modeling Transcription Factor Binding Sites

Author: Srinivasan Dhivya
Publication venue: SJSU ScholarWorks
Publication date: 01/10/2013
Field of study

Traditional Position Weight Matrices (PWMs) that are used to model Transcription Factor Binding Sites (TFBS) assume independence among different positions in the binding site. In reality, this may not necessarily be the case. A better way to model TFBS is to consider the distribution of dinucleotides or trinucleotides instead of just mononucleotides, thus taking neighboring nucleotides into account. We can therefore, extend the single nucleotide PWM to a dinucleotide PWM or an even higher-order PWM to correctly estimate the dependencies among the nucleotides in a given sequence. The purpose of this project is to develop an algorithm to implement higher-order PWMs to detect the TFBS and other biological motifs in DNA, RNA, and proteins

SJSU ScholarWorks

New scoring schema for finding motifs in DNA Sequences

Author: Ahrabian Hayedeh
Goliaei Bahram
Nowzari-Dalini Abbas
Sadeghi Mehdei
Zare-Mirakabad Fatemeh
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict) known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple mathematical calculations. By implementing our method on several biological data sets, it can be induced that this method performs better than methods that do not consider dependencies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central