Search CORE

4 research outputs found

msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

Author: Anil Raj (193015)
Heejung Shim (727802)
Jonathan K. Pritchard (113911)
Matthew Stephens (3815)
Yoav Gilad (2481)
Publication venue
Publication date: 25/09/2015
Field of study

<div><p>Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at <a href="http://rajanil.github.io/msCentipede" target="_blank">http://rajanil.github.io/msCentipede</a>.</p></div

University of Melbourne Institutional Repository

FigShare

Modeling factor-specific DNase I cleavage profile and sequence bias in DNase cleavage increases prediction accuracy.

Author: Anil Raj (193015)
Heejung Shim (727802)
Jonathan K. Pritchard (113911)
Matthew Stephens (3815)
Yoav Gilad (2481)
Publication venue
Publication date
Field of study

<p>A: Modeling the DNase I cleavage profile at bound sites increases the prediction accuracy of msCentipede across a broad range of transcription factors. Each point on the plot corresponds to a different transcription factor. B: We show the ROC curves for transcription factor EBF1 for three different models of increasing complexity. We observe a substantial increase in accuracy when incorporating a multi-scale model for the factor-specific cleavage profile; however, the increase in accuracy when modeling the background cleavage rate using naked DNA data is rather modest. This holds true for a broad range of factors as shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0138030#pone.0138030.s004" target="_blank">S4 Fig</a>.</p

FigShare

Accuracy of msCentipede, CENTIPEDE and PIQ across a range of transcription factors.

Author: Anil Raj (193015)
Heejung Shim (727802)
Jonathan K. Pritchard (113911)
Matthew Stephens (3815)
Yoav Gilad (2481)
Publication venue
Publication date
Field of study

<p>Each point corresponds to a different factor and accuracy is measured by area under the ROC curve. Blue points correpond to factors where msCentipede achieves higher accuracy than CENTIPEDE (top panels) or PIQ (bottom panels), and orange points correspond to a worse performance by msCentipede. A: The algorithms are compared using data from a single replicate. B: The algorithms are compared using data from multiple library replicates.</p

FigShare

Illustration that DNase I cleavage profiles exhibit excess variation compared with a multinomial model.

Author: Anil Raj (193015)
Heejung Shim (727802)
Jonathan K. Pritchard (113911)
Matthew Stephens (3815)
Yoav Gilad (2481)
Publication venue
Publication date
Field of study

<p>For a set of 1000 SP1 motif instances with high ChIP-seq signal, we computed, for a 100bp window around each motif instance, the ratio of number of DNase I cuts mapped to the left half of the window to the number of DNase I cuts mapped to the entire window. The histogram of these ‘observed ratios’ is shown in orange. Under a multinomial model the number of reads mapping to each half of the window should have a binomial distribution, and we used this fact to simulate ‘expected ratios’ (gray line); see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0138030#pone.0138030.s009" target="_blank">S1 Methods</a> for more details. The observed ratios are clearly overdispersed compared with the expectation under a multinomial model.</p

FigShare