A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Abeel; Afflerbach; Angarica; Bailey; Bart Hooghe; Bauer; Benos; Breiman; Bulyk; Burden; Calladine; Camenisch; Chen; Cho; Cordell; Davis; Dickerson; Ehret; Ernst; Frans van Roy; Friedel; Fujii; Fulton; Gama-Castro; Gardiner; Gartenberg; Gershenzon; Goodsell; Gorin; Gowrisankar; Greenbaum; Gunewardena; Hall; Hendrickson; Hu; Juo; Kajimura; Kaplan; Karas; Kel; Kim; Lavery; Lewis; Liu; Liu; Liu; Long; Lu; Lu; Lu; Lunetta; Man; Marco; Marinescu; Martinez-Hackert; Matys; Medina-Rivera; Meysman; Michel; Mokry; Morozov; Narang; Naughton; O'Flanagan; Olson; Paillard; Pan; Parker; Parvin; Pieter De Bleser; Ponomarenko; Portales-Casamar; Powell; Pudimat; Ramsey; Rohs; Rohs; Rohs; Ruiz; Satchwell; Schneider; Shakked; Sharon; Shi; Spolar; Stefan Broos; Stormo; Svozil; Thayer; Tomovic; Toro-Roman; Travers; Tullius; Wunderlich; Zhang; Zhang; Zhu

research

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Authors: Abeel
Afflerbach
Angarica
Bailey
Bart Hooghe
Bauer
Benos
Breiman
Bulyk
Burden
Calladine
Camenisch
Chen
Cho
Cordell
Davis
Dickerson
Ehret
Ernst
Frans van Roy
Friedel
Fujii
Fulton
Gama-Castro
Gardiner
Gartenberg
Gershenzon
Goodsell
Gorin
Gowrisankar
Greenbaum
Gunewardena
Hall
Hendrickson
Hu
Juo
Kajimura
Kaplan
Karas
Kel
Kim
Lavery
Lewis
Liu
Liu
Liu
Long
Lu
Lu
Lu
Lunetta
Man
Marco
Marinescu
Martinez-Hackert
Matys
Medina-Rivera
Meysman
Michel
Mokry
Morozov
Narang
Naughton
O'Flanagan
Olson
Paillard
Pan
Parker
Parvin
Pieter De Bleser
Ponomarenko
Portales-Casamar
Powell
Pudimat
Ramsey
Rohs
Rohs
Rohs
Ruiz
Satchwell
Schneider
Shakked
Sharon
Shi
Spolar
Stefan Broos
Stormo
Svozil
Thayer
Tomovic
Toro-Roman
Travers
Tullius
Wunderlich
Zhang
Zhang
Zhu
Publication date: 1 January 2012
Publisher: 'Oxford University Press (OUP)'
Doi

Abstract

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Crossref

info:doi/10.1093%2Fnar%2Fgks28...

Last time updated on 15/02/2019

Ghent University Academic Bibliography

oai:archive.ugent.be:3009502

Last time updated on 12/11/2016